1. 07 2月, 2017 4 次提交
  2. 01 2月, 2017 21 次提交
  3. 31 1月, 2017 1 次提交
    • S
      tracing: Fix hwlat kthread migration · 79c6f448
      Steven Rostedt (VMware) 提交于
      The hwlat tracer creates a kernel thread at start of the tracer. It is
      pinned to a single CPU and will move to the next CPU after each period of
      running. If the user modifies the migration thread's affinity, it will not
      change after that happens.
      
      The original code created the thread at the first instance it was called,
      but later was changed to destroy the thread after the tracer was finished,
      and would not be created until the next instance of the tracer was
      established. The code that initialized the affinity was only called on the
      initial instantiation of the tracer. After that, it was not initialized, and
      the previous affinity did not match the current newly created one, making
      it appear that the user modified the thread's affinity when it did not, and
      the thread failed to migrate again.
      
      Cc: stable@vger.kernel.org
      Fixes: 0330f7aa ("tracing: Have hwlat trace migrate across tracing_cpumask CPUs")
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      79c6f448
  4. 30 1月, 2017 6 次提交
  5. 27 1月, 2017 2 次提交
    • T
      cgroup: don't online subsystems before cgroup_name/path() are operational · 07cd1294
      Tejun Heo 提交于
      While refactoring cgroup creation, a5bca215 ("cgroup: factor out
      cgroup_create() out of cgroup_mkdir()") incorrectly onlined subsystems
      before the new cgroup is associated with it kernfs_node.  This is fine
      for cgroup proper but cgroup_name/path() depend on the associated
      kernfs_node and if a subsystem makes the new cgroup_subsys_state
      visible, which they're allowed to after onlining, it can lead to NULL
      dereference.
      
      The current code performs cgroup creation and subsystem onlining in
      cgroup_create() and cgroup_mkdir() makes the cgroup and subsystems
      visible afterwards.  There's no reason to online the subsystems early
      and we can simply drop cgroup_apply_control_enable() call from
      cgroup_create() so that the subsystems are onlined and made visible at
      the same time.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Fixes: a5bca215 ("cgroup: factor out cgroup_create() out of cgroup_mkdir()") 
      Cc: stable@vger.kernel.org # v4.6+
      07cd1294
    • E
      sysctl: fix proc_doulongvec_ms_jiffies_minmax() · ff9f8a7c
      Eric Dumazet 提交于
      We perform the conversion between kernel jiffies and ms only when
      exporting kernel value to user space.
      
      We need to do the opposite operation when value is written by user.
      
      Only matters when HZ != 1000
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ff9f8a7c
  6. 25 1月, 2017 2 次提交
  7. 24 1月, 2017 1 次提交
    • N
      userns: Make ucounts lock irq-safe · 880a3854
      Nikolay Borisov 提交于
      The ucounts_lock is being used to protect various ucounts lifecycle
      management functionalities. However, those services can also be invoked
      when a pidns is being freed in an RCU callback (e.g. softirq context).
      This can lead to deadlocks. There were already efforts trying to
      prevent similar deadlocks in add7c65c ("pid: fix lockdep deadlock
      warning due to ucount_lock"), however they just moved the context
      from hardirq to softrq. Fix this issue once and for all by explictly
      making the lock disable irqs altogether.
      
      Dmitry Vyukov <dvyukov@google.com> reported:
      
      > I've got the following deadlock report while running syzkaller fuzzer
      > on eec0d3d065bfcdf9cd5f56dd2a36b94d12d32297 of linux-next (on odroid
      > device if it matters):
      >
      > =================================
      > [ INFO: inconsistent lock state ]
      > 4.10.0-rc3-next-20170112-xc2-dirty #6 Not tainted
      > ---------------------------------
      > inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
      > swapper/2/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
      >  (ucounts_lock){+.?...}, at: [<     inline     >] spin_lock
      > ./include/linux/spinlock.h:302
      >  (ucounts_lock){+.?...}, at: [<ffff2000081678c8>]
      > put_ucounts+0x60/0x138 kernel/ucount.c:162
      > {SOFTIRQ-ON-W} state was registered at:
      > [<ffff2000081c82d8>] mark_lock+0x220/0xb60 kernel/locking/lockdep.c:3054
      > [<     inline     >] mark_irqflags kernel/locking/lockdep.c:2941
      > [<ffff2000081c97a8>] __lock_acquire+0x388/0x3260 kernel/locking/lockdep.c:3295
      > [<ffff2000081cce24>] lock_acquire+0xa4/0x138 kernel/locking/lockdep.c:3753
      > [<     inline     >] __raw_spin_lock ./include/linux/spinlock_api_smp.h:144
      > [<ffff200009798128>] _raw_spin_lock+0x90/0xd0 kernel/locking/spinlock.c:151
      > [<     inline     >] spin_lock ./include/linux/spinlock.h:302
      > [<     inline     >] get_ucounts kernel/ucount.c:131
      > [<ffff200008167c28>] inc_ucount+0x80/0x6c8 kernel/ucount.c:189
      > [<     inline     >] inc_mnt_namespaces fs/namespace.c:2818
      > [<ffff200008481850>] alloc_mnt_ns+0x78/0x3a8 fs/namespace.c:2849
      > [<ffff200008487298>] create_mnt_ns+0x28/0x200 fs/namespace.c:2959
      > [<     inline     >] init_mount_tree fs/namespace.c:3199
      > [<ffff200009bd6674>] mnt_init+0x258/0x384 fs/namespace.c:3251
      > [<ffff200009bd60bc>] vfs_caches_init+0x6c/0x80 fs/dcache.c:3626
      > [<ffff200009bb1114>] start_kernel+0x414/0x460 init/main.c:648
      > [<ffff200009bb01e8>] __primary_switched+0x6c/0x70 arch/arm64/kernel/head.S:456
      > irq event stamp: 2316924
      > hardirqs last  enabled at (2316924): [<     inline     >] rcu_do_batch
      > kernel/rcu/tree.c:2911
      > hardirqs last  enabled at (2316924): [<     inline     >]
      > invoke_rcu_callbacks kernel/rcu/tree.c:3182
      > hardirqs last  enabled at (2316924): [<     inline     >]
      > __rcu_process_callbacks kernel/rcu/tree.c:3149
      > hardirqs last  enabled at (2316924): [<ffff200008210414>]
      > rcu_process_callbacks+0x7a4/0xc28 kernel/rcu/tree.c:3166
      > hardirqs last disabled at (2316923): [<     inline     >] rcu_do_batch
      > kernel/rcu/tree.c:2900
      > hardirqs last disabled at (2316923): [<     inline     >]
      > invoke_rcu_callbacks kernel/rcu/tree.c:3182
      > hardirqs last disabled at (2316923): [<     inline     >]
      > __rcu_process_callbacks kernel/rcu/tree.c:3149
      > hardirqs last disabled at (2316923): [<ffff20000820fe80>]
      > rcu_process_callbacks+0x210/0xc28 kernel/rcu/tree.c:3166
      > softirqs last  enabled at (2316912): [<ffff20000811b4c4>]
      > _local_bh_enable+0x4c/0x80 kernel/softirq.c:155
      > softirqs last disabled at (2316913): [<     inline     >]
      > do_softirq_own_stack ./include/linux/interrupt.h:488
      > softirqs last disabled at (2316913): [<     inline     >]
      > invoke_softirq kernel/softirq.c:371
      > softirqs last disabled at (2316913): [<ffff20000811c994>]
      > irq_exit+0x264/0x308 kernel/softirq.c:405
      >
      > other info that might help us debug this:
      >  Possible unsafe locking scenario:
      >
      >        CPU0
      >        ----
      >   lock(ucounts_lock);
      >   <Interrupt>
      >     lock(ucounts_lock);
      >
      >  *** DEADLOCK ***
      >
      > 1 lock held by swapper/2/0:
      >  #0:  (rcu_callback){......}, at: [<     inline     >] __rcu_reclaim
      > kernel/rcu/rcu.h:108
      >  #0:  (rcu_callback){......}, at: [<     inline     >] rcu_do_batch
      > kernel/rcu/tree.c:2919
      >  #0:  (rcu_callback){......}, at: [<     inline     >]
      > invoke_rcu_callbacks kernel/rcu/tree.c:3182
      >  #0:  (rcu_callback){......}, at: [<     inline     >]
      > __rcu_process_callbacks kernel/rcu/tree.c:3149
      >  #0:  (rcu_callback){......}, at: [<ffff200008210390>]
      > rcu_process_callbacks+0x720/0xc28 kernel/rcu/tree.c:3166
      >
      > stack backtrace:
      > CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.10.0-rc3-next-20170112-xc2-dirty #6
      > Hardware name: Hardkernel ODROID-C2 (DT)
      > Call trace:
      > [<ffff20000808fa60>] dump_backtrace+0x0/0x440 arch/arm64/kernel/traps.c:500
      > [<ffff20000808fec0>] show_stack+0x20/0x30 arch/arm64/kernel/traps.c:225
      > [<ffff2000088a99e0>] dump_stack+0x110/0x168
      > [<ffff2000082fa2b4>] print_usage_bug.part.27+0x49c/0x4bc
      > kernel/locking/lockdep.c:2387
      > [<     inline     >] print_usage_bug kernel/locking/lockdep.c:2357
      > [<     inline     >] valid_state kernel/locking/lockdep.c:2400
      > [<     inline     >] mark_lock_irq kernel/locking/lockdep.c:2617
      > [<ffff2000081c89ec>] mark_lock+0x934/0xb60 kernel/locking/lockdep.c:3065
      > [<     inline     >] mark_irqflags kernel/locking/lockdep.c:2923
      > [<ffff2000081c9a60>] __lock_acquire+0x640/0x3260 kernel/locking/lockdep.c:3295
      > [<ffff2000081cce24>] lock_acquire+0xa4/0x138 kernel/locking/lockdep.c:3753
      > [<     inline     >] __raw_spin_lock ./include/linux/spinlock_api_smp.h:144
      > [<ffff200009798128>] _raw_spin_lock+0x90/0xd0 kernel/locking/spinlock.c:151
      > [<     inline     >] spin_lock ./include/linux/spinlock.h:302
      > [<ffff2000081678c8>] put_ucounts+0x60/0x138 kernel/ucount.c:162
      > [<ffff200008168364>] dec_ucount+0xf4/0x158 kernel/ucount.c:214
      > [<     inline     >] dec_pid_namespaces kernel/pid_namespace.c:89
      > [<ffff200008293dc8>] delayed_free_pidns+0x40/0xe0 kernel/pid_namespace.c:156
      > [<     inline     >] __rcu_reclaim kernel/rcu/rcu.h:118
      > [<     inline     >] rcu_do_batch kernel/rcu/tree.c:2919
      > [<     inline     >] invoke_rcu_callbacks kernel/rcu/tree.c:3182
      > [<     inline     >] __rcu_process_callbacks kernel/rcu/tree.c:3149
      > [<ffff2000082103d8>] rcu_process_callbacks+0x768/0xc28 kernel/rcu/tree.c:3166
      > [<ffff2000080821dc>] __do_softirq+0x324/0x6e0 kernel/softirq.c:284
      > [<     inline     >] do_softirq_own_stack ./include/linux/interrupt.h:488
      > [<     inline     >] invoke_softirq kernel/softirq.c:371
      > [<ffff20000811c994>] irq_exit+0x264/0x308 kernel/softirq.c:405
      > [<ffff2000081ecc28>] __handle_domain_irq+0xc0/0x150 kernel/irq/irqdesc.c:636
      > [<ffff200008081c80>] gic_handle_irq+0x68/0xd8
      > Exception stack(0xffff8000648e7dd0 to 0xffff8000648e7f00)
      > 7dc0:                                   ffff8000648d4b3c 0000000000000007
      > 7de0: 0000000000000000 1ffff0000c91a967 1ffff0000c91a967 1ffff0000c91a967
      > 7e00: ffff20000a4b6b68 0000000000000001 0000000000000007 0000000000000001
      > 7e20: 1fffe4000149ae90 ffff200009d35000 0000000000000000 0000000000000002
      > 7e40: 0000000000000000 0000000000000000 0000000002624a1a 0000000000000000
      > 7e60: 0000000000000000 ffff200009cbcd88 000060006d2ed000 0000000000000140
      > 7e80: ffff200009cff000 ffff200009cb6000 ffff200009cc2020 ffff200009d2159d
      > 7ea0: 0000000000000000 ffff8000648d4380 0000000000000000 ffff8000648e7f00
      > 7ec0: ffff20000820a478 ffff8000648e7f00 ffff20000820a47c 0000000010000145
      > 7ee0: 0000000000000140 dfff200000000000 ffffffffffffffff ffff20000820a478
      > [<ffff2000080837f8>] el1_irq+0xb8/0x130 arch/arm64/kernel/entry.S:486
      > [<     inline     >] arch_local_irq_restore
      > ./arch/arm64/include/asm/irqflags.h:81
      > [<ffff20000820a47c>] rcu_idle_exit+0x64/0xa8 kernel/rcu/tree.c:1030
      > [<     inline     >] cpuidle_idle_call kernel/sched/idle.c:200
      > [<ffff2000081bcbfc>] do_idle+0x1dc/0x2d0 kernel/sched/idle.c:243
      > [<ffff2000081bd1cc>] cpu_startup_entry+0x24/0x28 kernel/sched/idle.c:345
      > [<ffff200008099f8c>] secondary_start_kernel+0x2cc/0x358
      > arch/arm64/kernel/smp.c:276
      > [<000000000279f1a4>] 0x279f1a4
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Tested-by: NDmitry Vyukov <dvyukov@google.com>
      Fixes: add7c65c ("pid: fix lockdep deadlock warning due to ucount_lock")
      Fixes: f333c700 ("pidns: Add a limit on the number of pid namespaces")
      Cc: stable@vger.kernel.org
      Link: https://www.spinics.net/lists/kernel/msg2426637.htmlSigned-off-by: NNikolay Borisov <n.borisov.lkml@gmail.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      880a3854
  8. 20 1月, 2017 2 次提交
  9. 19 1月, 2017 1 次提交
    • D
      bpf: don't trigger OOM killer under pressure with map alloc · d407bd25
      Daniel Borkmann 提交于
      This patch adds two helpers, bpf_map_area_alloc() and bpf_map_area_free(),
      that are to be used for map allocations. Using kmalloc() for very large
      allocations can cause excessive work within the page allocator, so i) fall
      back earlier to vmalloc() when the attempt is considered costly anyway,
      and even more importantly ii) don't trigger OOM killer with any of the
      allocators.
      
      Since this is based on a user space request, for example, when creating
      maps with element pre-allocation, we really want such requests to fail
      instead of killing other user space processes.
      
      Also, don't spam the kernel log with warnings should any of the allocations
      fail under pressure. Given that, we can make backend selection in
      bpf_map_area_alloc() generic, and convert all maps over to use this API
      for spots with potentially large allocation requests.
      
      Note, replacing the one kmalloc_array() is fine as overflow checks happen
      earlier in htab_map_alloc(), since it must also protect the multiplication
      for vmalloc() should kmalloc_array() fail.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d407bd25