1. 07 9月, 2018 1 次提交
  2. 06 9月, 2018 3 次提交
    • S
      printk/tracing: Do not trace printk_nmi_enter() · d1c392c9
      Steven Rostedt (VMware) 提交于
      I hit the following splat in my tests:
      
      ------------[ cut here ]------------
      IRQs not enabled as expected
      WARNING: CPU: 3 PID: 0 at kernel/time/tick-sched.c:982 tick_nohz_idle_enter+0x44/0x8c
      Modules linked in: ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables ipv6
      CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.0-rc2-test+ #2
      Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
      EIP: tick_nohz_idle_enter+0x44/0x8c
      Code: ec 05 00 00 00 75 26 83 b8 c0 05 00 00 00 75 1d 80 3d d0 36 3e c1 00
      75 14 68 94 63 12 c1 c6 05 d0 36 3e c1 01 e8 04 ee f8 ff <0f> 0b 58 fa bb a0
      e5 66 c1 e8 25 0f 04 00 64 03 1d 28 31 52 c1 8b
      EAX: 0000001c EBX: f26e7f8c ECX: 00000006 EDX: 00000007
      ESI: f26dd1c0 EDI: 00000000 EBP: f26e7f40 ESP: f26e7f38
      DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010296
      CR0: 80050033 CR2: 0813c6b0 CR3: 2f342000 CR4: 001406f0
      Call Trace:
       do_idle+0x33/0x202
       cpu_startup_entry+0x61/0x63
       start_secondary+0x18e/0x1ed
       startup_32_smp+0x164/0x168
      irq event stamp: 18773830
      hardirqs last  enabled at (18773829): [<c040150c>] trace_hardirqs_on_thunk+0xc/0x10
      hardirqs last disabled at (18773830): [<c040151c>] trace_hardirqs_off_thunk+0xc/0x10
      softirqs last  enabled at (18773824): [<c0ddaa6f>] __do_softirq+0x25f/0x2bf
      softirqs last disabled at (18773767): [<c0416bbe>] call_on_stack+0x45/0x4b
      ---[ end trace b7c64aa79e17954a ]---
      
      After a bit of debugging, I found what was happening. This would trigger
      when performing "perf" with a high NMI interrupt rate, while enabling and
      disabling function tracer. Ftrace uses breakpoints to convert the nops at
      the start of functions to calls to the function trampolines. The breakpoint
      traps disable interrupts and this makes calls into lockdep via the
      trace_hardirqs_off_thunk in the entry.S code. What happens is the following:
      
        do_idle {
      
          [interrupts enabled]
      
          <interrupt> [interrupts disabled]
      	TRACE_IRQS_OFF [lockdep says irqs off]
      	[...]
      	TRACE_IRQS_IRET
      	    test if pt_regs say return to interrupts enabled [yes]
      	    TRACE_IRQS_ON [lockdep says irqs are on]
      
      	    <nmi>
      		nmi_enter() {
      		    printk_nmi_enter() [traced by ftrace]
      		    [ hit ftrace breakpoint ]
      		    <breakpoint exception>
      			TRACE_IRQS_OFF [lockdep says irqs off]
      			[...]
      			TRACE_IRQS_IRET [return from breakpoint]
      			   test if pt_regs say interrupts enabled [no]
      			   [iret back to interrupt]
      	   [iret back to code]
      
          tick_nohz_idle_enter() {
      
      	lockdep_assert_irqs_enabled() [lockdep say no!]
      
      Although interrupts are indeed enabled, lockdep thinks it is not, and since
      we now do asserts via lockdep, it gives a false warning. The issue here is
      that printk_nmi_enter() is called before lockdep_off(), which disables
      lockdep (for this reason) in NMIs. By simply not allowing ftrace to see
      printk_nmi_enter() (via notrace annotation) we keep lockdep from getting
      confused.
      
      Cc: stable@vger.kernel.org
      Fixes: 42a0bb3f ("printk/nmi: generic solution for safe printk in NMI")
      Acked-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: NPetr Mladek <pmladek@suse.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      d1c392c9
    • T
      cpu/hotplug: Prevent state corruption on error rollback · 69fa6eb7
      Thomas Gleixner 提交于
      When a teardown callback fails, the CPU hotplug code brings the CPU back to
      the previous state. The previous state becomes the new target state. The
      rollback happens in undo_cpu_down() which increments the state
      unconditionally even if the state is already the same as the target.
      
      As a consequence the next CPU hotplug operation will start at the wrong
      state. This is easily to observe when __cpu_disable() fails.
      
      Prevent the unconditional undo by checking the state vs. target before
      incrementing state and fix up the consequently wrong conditional in the
      unplug code which handles the failure of the final CPU take down on the
      control CPU side.
      
      Fixes: 4dddfb5f ("smp/hotplug: Rewrite AP state machine core")
      Reported-by: NNeeraj Upadhyay <neeraju@codeaurora.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Tested-by: NSudeep Holla <sudeep.holla@arm.com>
      Tested-by: NNeeraj Upadhyay <neeraju@codeaurora.org>
      Cc: josh@joshtriplett.org
      Cc: peterz@infradead.org
      Cc: jiangshanlai@gmail.com
      Cc: dzickus@redhat.com
      Cc: brendan.jackman@arm.com
      Cc: malat@debian.org
      Cc: sramana@codeaurora.org
      Cc: linux-arm-msm@vger.kernel.org
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1809051419580.1416@nanos.tec.linutronix.de
      
      ----
      69fa6eb7
    • N
      cpu/hotplug: Adjust misplaced smb() in cpuhp_thread_fun() · f8b7530a
      Neeraj Upadhyay 提交于
      The smp_mb() in cpuhp_thread_fun() is misplaced. It needs to be after the
      load of st->should_run to prevent reordering of the later load/stores
      w.r.t. the load of st->should_run.
      
      Fixes: 4dddfb5f ("smp/hotplug: Rewrite AP state machine core")
      Signed-off-by: NNeeraj Upadhyay <neeraju@codeaurora.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infraded.org>
      Cc: josh@joshtriplett.org
      Cc: peterz@infradead.org
      Cc: jiangshanlai@gmail.com
      Cc: dzickus@redhat.com
      Cc: brendan.jackman@arm.com
      Cc: malat@debian.org
      Cc: mojha@codeaurora.org
      Cc: sramana@codeaurora.org
      Cc: linux-arm-msm@vger.kernel.org
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/1536126727-11629-1-git-send-email-neeraju@codeaurora.org
      f8b7530a
  3. 05 9月, 2018 1 次提交
  4. 03 9月, 2018 1 次提交
    • J
      bpf: avoid misuse of psock when TCP_ULP_BPF collides with another ULP · 597222f7
      John Fastabend 提交于
      Currently we check sk_user_data is non NULL to determine if the sk
      exists in a map. However, this is not sufficient to ensure the psock
      or the ULP ops are not in use by another user, such as kcm or TLS. To
      avoid this when adding a sock to a map also verify it is of the
      correct ULP type. Additionally, when releasing a psock verify that
      it is the TCP_ULP_BPF type before releasing the ULP. The error case
      where we abort an update due to ULP collision can cause this error
      path.
      
      For example,
      
        __sock_map_ctx_update_elem()
           [...]
           err = tcp_set_ulp_id(sock, TCP_ULP_BPF) <- collides with TLS
           if (err)                                <- so err out here
              goto out_free
           [...]
        out_free:
           smap_release_sock() <- calling tcp_cleanup_ulp releases the
                                  TLS ULP incorrectly.
      
      Fixes: 2f857d04 ("bpf: sockmap, remove STRPARSER map_flags and add multi-map support")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      597222f7
  5. 01 9月, 2018 1 次提交
  6. 31 8月, 2018 1 次提交
  7. 30 8月, 2018 2 次提交
  8. 28 8月, 2018 3 次提交
    • J
      bpf: sockmap, decrement copied count correctly in redirect error case · 501ca817
      John Fastabend 提交于
      Currently, when a redirect occurs in sockmap and an error occurs in
      the redirect call we unwind the scatterlist once in the error path
      of bpf_tcp_sendmsg_do_redirect() and then again in sendmsg(). Then
      in the error path of sendmsg we decrement the copied count by the
      send size.
      
      However, its possible we partially sent data before the error was
      generated. This can happen if do_tcp_sendpages() partially sends the
      scatterlist before encountering a memory pressure error. If this
      happens we need to decrement the copied value (the value tracking
      how many bytes were actually sent to TCP stack) by the number of
      remaining bytes _not_ the entire send size. Otherwise we risk
      confusing userspace.
      
      Also we don't need two calls to free the scatterlist one is
      good enough. So remove the one in bpf_tcp_sendmsg_do_redirect() and
      then properly reduce copied by the number of remaining bytes which
      may in fact be the entire send size if no bytes were sent.
      
      To do this use bool to indicate if free_start_sg() should do mem
      accounting or not.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      501ca817
    • D
      bpf, sockmap: fix psock refcount leak in bpf_tcp_recvmsg · 15c480ef
      Daniel Borkmann 提交于
      In bpf_tcp_recvmsg() we first took a reference on the psock, however
      once we find that there are skbs in the normal socket's receive queue
      we return with processing them through tcp_recvmsg(). Problem is that
      we leak the taken reference on the psock in that path. Given we don't
      really do anything with the psock at this point, move the skb_queue_empty()
      test before we fetch the psock to fix this case.
      
      Fixes: 8934ce2f ("bpf: sockmap redirect ingress support")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      15c480ef
    • D
      bpf, sockmap: fix potential use after free in bpf_tcp_close · e06fa9c1
      Daniel Borkmann 提交于
      bpf_tcp_close() we pop the psock linkage to a map via psock_map_pop().
      A parallel update on the sock hash map can happen between psock_map_pop()
      and lookup_elem_raw() where we override the element under link->hash /
      link->key. In bpf_tcp_close()'s lookup_elem_raw() we subsequently only
      test whether an element is present, but we do not test whether the
      element is infact the element we were looking for.
      
      We lock the sock in bpf_tcp_close() during that time, so do we hold
      the lock in sock_hash_update_elem(). However, the latter locks the
      sock which is newly updated, not the one we're purging from the hash
      table. This means that while one CPU is doing the lookup from bpf_tcp_close(),
      another CPU is doing the map update in parallel, dropped our sock from
      the hlist and released the psock.
      
      Subsequently the first CPU will find the new sock and attempts to drop
      and release the old sock yet another time. Fix is that we need to check
      the elements for a match after lookup, similar as we do in the sock map.
      Note that the hash tab elems are freed via RCU, so access to their
      link->hash / link->key is fine since we're under RCU read side there.
      
      Fixes: e9db4ef6 ("bpf: sockhash fix omitted bucket lock in sock_close")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      e06fa9c1
  9. 24 8月, 2018 4 次提交
  10. 23 8月, 2018 23 次提交