1. 28 10月, 2016 3 次提交
    • J
      perf/powerpc: Don't call perf_event_disable() from atomic context · 5aab90ce
      Jiri Olsa 提交于
      The trinity syscall fuzzer triggered following WARN() on powerpc:
      
        WARNING: CPU: 9 PID: 2998 at arch/powerpc/kernel/hw_breakpoint.c:278
        ...
        NIP [c00000000093aedc] .hw_breakpoint_handler+0x28c/0x2b0
        LR [c00000000093aed8] .hw_breakpoint_handler+0x288/0x2b0
        Call Trace:
        [c0000002f7933580] [c00000000093aed8] .hw_breakpoint_handler+0x288/0x2b0 (unreliable)
        [c0000002f7933630] [c0000000000f671c] .notifier_call_chain+0x7c/0xf0
        [c0000002f79336d0] [c0000000000f6abc] .__atomic_notifier_call_chain+0xbc/0x1c0
        [c0000002f7933780] [c0000000000f6c40] .notify_die+0x70/0xd0
        [c0000002f7933820] [c00000000001a74c] .do_break+0x4c/0x100
        [c0000002f7933920] [c0000000000089fc] handle_dabr_fault+0x14/0x48
      
      Followed by a lockdep warning:
      
        ===============================
        [ INFO: suspicious RCU usage. ]
        4.8.0-rc5+ #7 Tainted: G        W
        -------------------------------
        ./include/linux/rcupdate.h:556 Illegal context switch in RCU read-side critical section!
      
        other info that might help us debug this:
      
        rcu_scheduler_active = 1, debug_locks = 0
        2 locks held by ls/2998:
         #0:  (rcu_read_lock){......}, at: [<c0000000000f6a00>] .__atomic_notifier_call_chain+0x0/0x1c0
         #1:  (rcu_read_lock){......}, at: [<c00000000093ac50>] .hw_breakpoint_handler+0x0/0x2b0
      
        stack backtrace:
        CPU: 9 PID: 2998 Comm: ls Tainted: G        W       4.8.0-rc5+ #7
        Call Trace:
        [c0000002f7933150] [c00000000094b1f8] .dump_stack+0xe0/0x14c (unreliable)
        [c0000002f79331e0] [c00000000013c468] .lockdep_rcu_suspicious+0x138/0x180
        [c0000002f7933270] [c0000000001005d8] .___might_sleep+0x278/0x2e0
        [c0000002f7933300] [c000000000935584] .mutex_lock_nested+0x64/0x5a0
        [c0000002f7933410] [c00000000023084c] .perf_event_ctx_lock_nested+0x16c/0x380
        [c0000002f7933500] [c000000000230a80] .perf_event_disable+0x20/0x60
        [c0000002f7933580] [c00000000093aeec] .hw_breakpoint_handler+0x29c/0x2b0
        [c0000002f7933630] [c0000000000f671c] .notifier_call_chain+0x7c/0xf0
        [c0000002f79336d0] [c0000000000f6abc] .__atomic_notifier_call_chain+0xbc/0x1c0
        [c0000002f7933780] [c0000000000f6c40] .notify_die+0x70/0xd0
        [c0000002f7933820] [c00000000001a74c] .do_break+0x4c/0x100
        [c0000002f7933920] [c0000000000089fc] handle_dabr_fault+0x14/0x48
      
      While it looks like the first WARN() is probably valid, the other one is
      triggered by disabling event via perf_event_disable() from atomic context.
      
      The event is disabled here in case we were not able to emulate
      the instruction that hit the breakpoint. By disabling the event
      we unschedule the event and make sure it's not scheduled back.
      
      But we can't call perf_event_disable() from atomic context, instead
      we need to use the event's pending_disable irq_work method to disable it.
      Reported-by: NJan Stancek <jstancek@redhat.com>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20161026094824.GA21397@kravaSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5aab90ce
    • M
      kconfig.h: remove config_enabled() macro · c0a0aba8
      Masahiro Yamada 提交于
      The use of config_enabled() is ambiguous.  For config options,
      IS_ENABLED(), IS_REACHABLE(), etc.  will make intention clearer.
      Sometimes config_enabled() has been used for non-config options because
      it is useful to check whether the given symbol is defined or not.
      
      I have been tackling on deprecating config_enabled(), and now is the
      time to finish this work.
      
      Some new users have appeared for v4.9-rc1, but it is trivial to replace
      them:
      
       - arch/x86/mm/kaslr.c
        replace config_enabled() with IS_ENABLED() because
        CONFIG_X86_ESPFIX64 and CONFIG_EFI are boolean.
      
       - include/asm-generic/export.h
        replace config_enabled() with __is_defined().
      
      Then, config_enabled() can be removed now.
      
      Going forward, please use IS_ENABLED(), IS_REACHABLE(), etc. for config
      options, and __is_defined() for non-config symbols.
      
      Link: http://lkml.kernel.org/r/1476616078-32252-1-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NNicolas Pitre <nicolas.pitre@linaro.org>
      Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Michal Marek <mmarek@suse.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Paul Bolle <pebolle@tiscali.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c0a0aba8
    • L
      mm: remove per-zone hashtable of bitlock waitqueues · 9dcb8b68
      Linus Torvalds 提交于
      The per-zone waitqueues exist because of a scalability issue with the
      page waitqueues on some NUMA machines, but it turns out that they hurt
      normal loads, and now with the vmalloced stacks they also end up
      breaking gfs2 that uses a bit_wait on a stack object:
      
           wait_on_bit(&gh->gh_iflags, HIF_WAIT, TASK_UNINTERRUPTIBLE)
      
      where 'gh' can be a reference to the local variable 'mount_gh' on the
      stack of fill_super().
      
      The reason the per-zone hash table breaks for this case is that there is
      no "zone" for virtual allocations, and trying to look up the physical
      page to get at it will fail (with a BUG_ON()).
      
      It turns out that I actually complained to the mm people about the
      per-zone hash table for another reason just a month ago: the zone lookup
      also hurts the regular use of "unlock_page()" a lot, because the zone
      lookup ends up forcing several unnecessary cache misses and generates
      horrible code.
      
      As part of that earlier discussion, we had a much better solution for
      the NUMA scalability issue - by just making the page lock have a
      separate contention bit, the waitqueue doesn't even have to be looked at
      for the normal case.
      
      Peter Zijlstra already has a patch for that, but let's see if anybody
      even notices.  In the meantime, let's fix the actual gfs2 breakage by
      simplifying the bitlock waitqueues and removing the per-zone issue.
      Reported-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Tested-by: NBob Peterson <rpeterso@redhat.com>
      Acked-by: NMel Gorman <mgorman@techsingularity.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9dcb8b68
  2. 26 10月, 2016 1 次提交
    • D
      x86/io: add interface to reserve io memtype for a resource range. (v1.1) · 8ef42276
      Dave Airlie 提交于
      A recent change to the mm code in:
      87744ab3 mm: fix cache mode tracking in vm_insert_mixed()
      
      started enforcing checking the memory type against the registered list for
      amixed pfn insertion mappings. It happens that the drm drivers for a number
      of gpus relied on this being broken. Currently the driver only inserted
      VRAM mappings into the tracking table when they came from the kernel,
      and userspace mappings never landed in the table. This led to a regression
      where all the mapping end up as UC instead of WC now.
      
      I've considered a number of solutions but since this needs to be fixed
      in fixes and not next, and some of the solutions were going to introduce
      overhead that hadn't been there before I didn't consider them viable at
      this stage. These mainly concerned hooking into the TTM io reserve APIs,
      but these API have a bunch of fast paths I didn't want to unwind to add
      this to.
      
      The solution I've decided on is to add a new API like the arch_phys_wc
      APIs (these would have worked but wc_del didn't take a range), and
      use them from the drivers to add a WC compatible mapping to the table
      for all VRAM on those GPUs. This means we can then create userspace
      mapping that won't get degraded to UC.
      
      v1.1: use CONFIG_X86_PAT + add some comments in io.h
      
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: x86@kernel.org
      Cc: mcgrof@suse.com
      Cc: Dan Williams <dan.j.williams@intel.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      8ef42276
  3. 25 10月, 2016 1 次提交
    • L
      mm: unexport __get_user_pages() · 0d731759
      Lorenzo Stoakes 提交于
      This patch unexports the low-level __get_user_pages() function.
      
      Recent refactoring of the get_user_pages* functions allow flags to be
      passed through get_user_pages() which eliminates the need for access to
      this function from its one user, kvm.
      
      We can see that the two calls to get_user_pages() which replace
      __get_user_pages() in kvm_main.c are equivalent by examining their call
      stacks:
      
        get_user_page_nowait():
          get_user_pages(start, 1, flags, page, NULL)
          __get_user_pages_locked(current, current->mm, start, 1, page, NULL, NULL,
      			    false, flags | FOLL_TOUCH)
          __get_user_pages(current, current->mm, start, 1,
      		     flags | FOLL_TOUCH | FOLL_GET, page, NULL, NULL)
      
        check_user_page_hwpoison():
          get_user_pages(addr, 1, flags, NULL, NULL)
          __get_user_pages_locked(current, current->mm, addr, 1, NULL, NULL, NULL,
      			    false, flags | FOLL_TOUCH)
          __get_user_pages(current, current->mm, addr, 1, flags | FOLL_TOUCH, NULL,
      		     NULL, NULL)
      Signed-off-by: NLorenzo Stoakes <lstoakes@gmail.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0d731759
  4. 21 10月, 2016 1 次提交
    • R
      clocksource: Add J-Core timer/clocksource driver · 9995f4f1
      Rich Felker 提交于
      At the hardware level, the J-Core PIT is integrated with the interrupt
      controller, but it is represented as its own device and has an
      independent programming interface. It provides a 12-bit countdown
      timer, which is not presently used, and a periodic timer. The interval
      length for the latter is programmable via a 32-bit throttle register
      whose units are determined by a bus-period register. The periodic
      timer is used to implement both periodic and oneshot clock event
      modes; in oneshot mode the interrupt handler simply disables the timer
      as soon as it fires.
      
      Despite its device tree node representing an interrupt for the PIT,
      the actual irq generated is programmable, not hard-wired. The driver
      is responsible for programming the PIT to generate the hardware irq
      number that the DT assigns to it.
      
      On SMP configurations, J-Core provides cpu-local instances of the PIT;
      no broadcast timer is needed. This driver supports the creation of the
      necessary per-cpu clock_event_device instances.
      
      A nanosecond-resolution clocksource is provided using the J-Core "RTC"
      registers, which give a 64-bit seconds count and 32-bit nanoseconds
      that wrap every second. The driver converts these to a full-range
      32-bit nanoseconds count.
      Signed-off-by: NRich Felker <dalias@libc.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: devicetree@vger.kernel.org
      Cc: linux-sh@vger.kernel.org
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Link: http://lkml.kernel.org/r/b591ff12cc5ebf63d1edc98da26046f95a233814.1476393790.git.dalias@libc.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      9995f4f1
  5. 20 10月, 2016 8 次提交
  6. 19 10月, 2016 9 次提交
  7. 18 10月, 2016 2 次提交
  8. 17 10月, 2016 1 次提交
  9. 16 10月, 2016 1 次提交
    • D
      kprobes: Unpoison stack in jprobe_return() for KASAN · 9f7d416c
      Dmitry Vyukov 提交于
      I observed false KSAN positives in the sctp code, when
      sctp uses jprobe_return() in jsctp_sf_eat_sack().
      
      The stray 0xf4 in shadow memory are stack redzones:
      
      [     ] ==================================================================
      [     ] BUG: KASAN: stack-out-of-bounds in memcmp+0xe9/0x150 at addr ffff88005e48f480
      [     ] Read of size 1 by task syz-executor/18535
      [     ] page:ffffea00017923c0 count:0 mapcount:0 mapping:          (null) index:0x0
      [     ] flags: 0x1fffc0000000000()
      [     ] page dumped because: kasan: bad access detected
      [     ] CPU: 1 PID: 18535 Comm: syz-executor Not tainted 4.8.0+ #28
      [     ] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      [     ]  ffff88005e48f2d0 ffffffff82d2b849 ffffffff0bc91e90 fffffbfff10971e8
      [     ]  ffffed000bc91e90 ffffed000bc91e90 0000000000000001 0000000000000000
      [     ]  ffff88005e48f480 ffff88005e48f350 ffffffff817d3169 ffff88005e48f370
      [     ] Call Trace:
      [     ]  [<ffffffff82d2b849>] dump_stack+0x12e/0x185
      [     ]  [<ffffffff817d3169>] kasan_report+0x489/0x4b0
      [     ]  [<ffffffff817d31a9>] __asan_report_load1_noabort+0x19/0x20
      [     ]  [<ffffffff82d49529>] memcmp+0xe9/0x150
      [     ]  [<ffffffff82df7486>] depot_save_stack+0x176/0x5c0
      [     ]  [<ffffffff817d2031>] save_stack+0xb1/0xd0
      [     ]  [<ffffffff817d27f2>] kasan_slab_free+0x72/0xc0
      [     ]  [<ffffffff817d05b8>] kfree+0xc8/0x2a0
      [     ]  [<ffffffff85b03f19>] skb_free_head+0x79/0xb0
      [     ]  [<ffffffff85b0900a>] skb_release_data+0x37a/0x420
      [     ]  [<ffffffff85b090ff>] skb_release_all+0x4f/0x60
      [     ]  [<ffffffff85b11348>] consume_skb+0x138/0x370
      [     ]  [<ffffffff8676ad7b>] sctp_chunk_put+0xcb/0x180
      [     ]  [<ffffffff8676ae88>] sctp_chunk_free+0x58/0x70
      [     ]  [<ffffffff8677fa5f>] sctp_inq_pop+0x68f/0xef0
      [     ]  [<ffffffff8675ee36>] sctp_assoc_bh_rcv+0xd6/0x4b0
      [     ]  [<ffffffff8677f2c1>] sctp_inq_push+0x131/0x190
      [     ]  [<ffffffff867bad69>] sctp_backlog_rcv+0xe9/0xa20
      [ ... ]
      [     ] Memory state around the buggy address:
      [     ]  ffff88005e48f380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [     ]  ffff88005e48f400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [     ] >ffff88005e48f480: f4 f4 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [     ]                    ^
      [     ]  ffff88005e48f500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [     ]  ffff88005e48f580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [     ] ==================================================================
      
      KASAN stack instrumentation poisons stack redzones on function entry
      and unpoisons them on function exit. If a function exits abnormally
      (e.g. with a longjmp like jprobe_return()), stack redzones are left
      poisoned. Later this leads to random KASAN false reports.
      
      Unpoison stack redzones in the frames we are going to jump over
      before doing actual longjmp in jprobe_return().
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: kasan-dev@googlegroups.com
      Cc: surovegin@google.com
      Cc: rostedt@goodmis.org
      Link: http://lkml.kernel.org/r/1476454043-101898-1-git-send-email-dvyukov@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9f7d416c
  10. 15 10月, 2016 2 次提交
  11. 14 10月, 2016 1 次提交
    • M
      vfs: add vfs_get_link() helper · d60874cd
      Miklos Szeredi 提交于
      This helper is for filesystems that want to read the symlink and are better
      off with the get_link() interface (returning a char *) rather than the
      readlink() interface (copy into a userspace buffer).
      
      Also call the LSM hook for readlink (not get_link) since this is for
      symlink reading not following.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      d60874cd
  12. 13 10月, 2016 2 次提交
    • T
      net/mlx5: Add MLX5_ARRAY_SET64 to fix BUILD_BUG_ON · b8a4ddb2
      Tom Herbert 提交于
      I am hitting this in mlx5:
      
      drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c: In function
      reclaim_pages_cmd.clone.0:
      drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:346: error: call
      to __compiletime_assert_346 declared with attribute error:
      BUILD_BUG_ON failed: __mlx5_bit_off(manage_pages_out, pas[i]) % 64
      drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c: In function give_pages:
      drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:291: error: call
      to __compiletime_assert_291 declared with attribute error:
      BUILD_BUG_ON failed: __mlx5_bit_off(manage_pages_in, pas[i]) % 64
      
      Problem is that this is doing a BUILD_BUG_ON on a non-constant
      expression because of trying to take offset of pas[i] in the
      structure.
      
      Fix is to create MLX5_ARRAY_SET64 that takes an additional argument
      that is the field index to separate between BUILD_BUG_ON on the array
      constant field and the indexed field to assign the value to.
      There are two callers of MLX5_SET64 that are trying to get a variable
      offset, change those to call MLX5_ARRAY_SET64 passing 'pas' and 'i'
      as the arguments to use in the offset check and the indexed value
      assignment.
      
      Fixes: a533ed5e ("net/mlx5: Pages management commands via mlx5 ifc")
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8a4ddb2
    • A
      cpufreq: skip invalid entries when searching the frequency · 899bb664
      Aaro Koskinen 提交于
      Skip invalid entries when searching the frequency. This fixes cpufreq
      at least on loongson2 MIPS board.
      
      Fixes: da0c6dc0 (cpufreq: Handle sorted frequency tables more efficiently)
      Signed-off-by: NAaro Koskinen <aaro.koskinen@iki.fi>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Cc: 4.8+ <stable@vger.kernel.org> # 4.8+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      899bb664
  13. 12 10月, 2016 8 次提交
    • M
      mm: split gfp_mask and mapping flags into separate fields · 9c5d760b
      Michal Hocko 提交于
      mapping->flags currently encodes two different things into a single flag.
      It contains sticky gfp_mask for page cache allocations and AS_ codes used
      to report errors/enospace and other states which are mapping specific.
      Condensing the two semantically unrelated things saves few bytes but it
      also complicates other things.  For one thing the gfp flags space is
      reduced and in fact we are already running out of available bits.  It can
      be assumed that more gfp flags will be necessary later on.
      
      To not introduce the address_space grow (at least on x86_64) we can stick
      it right after private_lock because we have a hole there.
      
      struct address_space {
              struct inode *             host;                 /*     0     8 */
              struct radix_tree_root     page_tree;            /*     8    16 */
              spinlock_t                 tree_lock;            /*    24     4 */
              atomic_t                   i_mmap_writable;      /*    28     4 */
              struct rb_root             i_mmap;               /*    32     8 */
              struct rw_semaphore        i_mmap_rwsem;         /*    40    40 */
              /* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */
              long unsigned int          nrpages;              /*    80     8 */
              long unsigned int          nrexceptional;        /*    88     8 */
              long unsigned int          writeback_index;      /*    96     8 */
              const struct address_space_operations  * a_ops;  /*   104     8 */
              long unsigned int          flags;                /*   112     8 */
              spinlock_t                 private_lock;         /*   120     4 */
      
              /* XXX 4 bytes hole, try to pack */
      
              /* --- cacheline 2 boundary (128 bytes) --- */
              struct list_head           private_list;         /*   128    16 */
              void *                     private_data;         /*   144     8 */
      
              /* size: 152, cachelines: 3, members: 14 */
              /* sum members: 148, holes: 1, sum holes: 4 */
              /* last cacheline: 24 bytes */
      };
      
      Link: http://lkml.kernel.org/r/20160912114852.GI14524@dhcp22.suse.czSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9c5d760b
    • M
      treewide: remove redundant #include <linux/kconfig.h> · 97139d4a
      Masahiro Yamada 提交于
      Kernel source files need not include <linux/kconfig.h> explicitly
      because the top Makefile forces to include it with:
      
        -include $(srctree)/include/linux/kconfig.h
      
      This commit removes explicit includes except the following:
      
        * arch/s390/include/asm/facilities_src.h
        * tools/testing/radix-tree/linux/kernel.h
      
      These two are used for host programs.
      
      Link: http://lkml.kernel.org/r/1473656164-11929-1-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      97139d4a
    • J
      kthread: add kerneldoc for kthread_create() · e154ccc8
      Jonathan Corbet 提交于
      This macro is referenced in other kerneldoc comments, but lacks one of its
      own; fix that.
      
      Link: http://lkml.kernel.org/r/20160826072313.726a3485@lwn.netSigned-off-by: NJonathan Corbet <corbet@lwn.net>
      Reported-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e154ccc8
    • P
      kthread: better support freezable kthread workers · dbf52682
      Petr Mladek 提交于
      This patch allows to make kthread worker freezable via a new @flags
      parameter. It will allow to avoid an init work in some kthreads.
      
      It currently does not affect the function of kthread_worker_fn()
      but it might help to do some optimization or fixes eventually.
      
      I currently do not know about any other use for the @flags
      parameter but I believe that we will want more flags
      in the future.
      
      Finally, I hope that it will not cause confusion with @flags member
      in struct kthread. Well, I guess that we will want to rework the
      basic kthreads implementation once all kthreads are converted into
      kthread workers or workqueues. It is possible that we will merge
      the two structures.
      
      Link: http://lkml.kernel.org/r/1470754545-17632-12-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dbf52682
    • P
      kthread: allow to modify delayed kthread work · 9a6b06c8
      Petr Mladek 提交于
      There are situations when we need to modify the delay of a delayed kthread
      work. For example, when the work depends on an event and the initial delay
      means a timeout. Then we want to queue the work immediately when the event
      happens.
      
      This patch implements kthread_mod_delayed_work() as inspired workqueues.
      It cancels the timer, removes the work from any worker list and queues it
      again with the given timeout.
      
      A very special case is when the work is being canceled at the same time.
      It might happen because of the regular kthread_cancel_delayed_work_sync()
      or by another kthread_mod_delayed_work(). In this case, we do nothing and
      let the other operation win. This should not normally happen as the caller
      is supposed to synchronize these operations a reasonable way.
      
      Link: http://lkml.kernel.org/r/1470754545-17632-11-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9a6b06c8
    • P
      kthread: allow to cancel kthread work · 37be45d4
      Petr Mladek 提交于
      We are going to use kthread workers more widely and sometimes we will need
      to make sure that the work is neither pending nor running.
      
      This patch implements cancel_*_sync() operations as inspired by
      workqueues.  Well, we are synchronized against the other operations via
      the worker lock, we use del_timer_sync() and a counter to count parallel
      cancel operations.  Therefore the implementation might be easier.
      
      First, we check if a worker is assigned.  If not, the work has newer been
      queued after it was initialized.
      
      Second, we take the worker lock.  It must be the right one.  The work must
      not be assigned to another worker unless it is initialized in between.
      
      Third, we try to cancel the timer when it exists.  The timer is deleted
      synchronously to make sure that the timer call back is not running.  We
      need to temporary release the worker->lock to avoid a possible deadlock
      with the callback.  In the meantime, we set work->canceling counter to
      avoid any queuing.
      
      Fourth, we try to remove the work from a worker list. It might be
      the list of either normal or delayed works.
      
      Fifth, if the work is running, we call kthread_flush_work().  It might
      take an arbitrary time.  We need to release the worker-lock again.  In the
      meantime, we again block any queuing by the canceling counter.
      
      As already mentioned, the check for a pending kthread work is done under a
      lock.  In compare with workqueues, we do not need to fight for a single
      PENDING bit to block other operations.  Therefore we do not suffer from
      the thundering storm problem and all parallel canceling jobs might use
      kthread_flush_work().  Any queuing is blocked until the counter gets zero.
      
      Link: http://lkml.kernel.org/r/1470754545-17632-10-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      37be45d4
    • P
      kthread: initial support for delayed kthread work · 22597dc3
      Petr Mladek 提交于
      We are going to use kthread_worker more widely and delayed works
      will be pretty useful.
      
      The implementation is inspired by workqueues.  It uses a timer to queue
      the work after the requested delay.  If the delay is zero, the work is
      queued immediately.
      
      In compare with workqueues, each work is associated with a single worker
      (kthread).  Therefore the implementation could be much easier.  In
      particular, we use the worker->lock to synchronize all the operations with
      the work.  We do not need any atomic operation with a flags variable.
      
      In fact, we do not need any state variable at all.  Instead, we add a list
      of delayed works into the worker.  Then the pending work is listed either
      in the list of queued or delayed works.  And the existing check of pending
      works is the same even for the delayed ones.
      
      A work must not be assigned to another worker unless reinitialized.
      Therefore the timer handler might expect that dwork->work->worker is valid
      and it could simply take the lock.  We just add some sanity checks to help
      with debugging a potential misuse.
      
      Link: http://lkml.kernel.org/r/1470754545-17632-9-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      22597dc3
    • P
      kthread: add kthread_destroy_worker() · 35033fe9
      Petr Mladek 提交于
      The current kthread worker users call flush() and stop() explicitly.
      This function does the same plus it frees the kthread_worker struct
      in one call.
      
      It is supposed to be used together with kthread_create_worker*() that
      allocates struct kthread_worker.
      
      Link: http://lkml.kernel.org/r/1470754545-17632-7-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      35033fe9