1. 05 2月, 2019 5 次提交
    • U
      net/smc: correct state change for peer closing · 84b799a2
      Ursula Braun 提交于
      If some kind of closing is received from the peer while still in
      state SMC_INIT, it means the peer has had an active connection and
      closed the socket quickly before listen_work finished. This should
      not result in a shortcut from state SMC_INIT to state SMC_CLOSED.
      This patch adds the socket to the accept queue in state
      SMC_APPCLOSEWAIT1. The socket reaches state SMC_CLOSED once being
      accepted and closed with smc_release().
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      84b799a2
    • U
      net/smc: delete rkey first before switching to unused · a5e04318
      Ursula Braun 提交于
      Once RMBs are flagged as unused they are candidates for reuse.
      Thus the LLC DELETE RKEY operaton should be made before flagging
      the RMB as unused.
      
      Fixes: c7674c00 ("net/smc: unregister rkeys of unused buffer")
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5e04318
    • U
      net/smc: fix sender_free computation · b8649efa
      Ursula Braun 提交于
      In some scenarios a separate consumer cursor update is necessary.
      The decision is made in smc_tx_consumer_cursor_update(). The
      sender_free computation could be wrong:
      
      The rx confirmed cursor is always smaller than or equal to the
      rx producer cursor. The parameters in the smc_curs_diff() call
      have to be exchanged, otherwise sender_free might even be negative.
      
      And if more data arrives local_rx_ctrl.prod might be updated, enabling
      a cursor difference between local_rx_ctrl.prod and rx confirmed cursor
      larger than the RMB size. This case is not covered by smc_curs_diff().
      Thus function smc_curs_diff_large() is introduced here.
      
      If a recvmsg() is processed in parallel, local_tx_ctrl.cons might
      change during smc_cdc_msg_send. Make sure rx_curs_confirmed is updated
      with the actually sent local_tx_ctrl.cons value.
      
      Fixes: e82f2e31 ("net/smc: optimize consumer cursor updates")
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8649efa
    • U
      net/smc: preallocated memory for rdma work requests · ad6f317f
      Ursula Braun 提交于
      The work requests for rdma writes are built in local variables within
      function smc_tx_rdma_write(). This violates the rule that the work
      request storage has to stay till the work request is confirmed by
      a completion queue response.
      This patch introduces preallocated memory for these work requests.
      The storage is allocated, once a link (and thus a queue pair) is
      established.
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad6f317f
    • S
      net: dp83640: expire old TX-skb · 53bc8d2a
      Sebastian Andrzej Siewior 提交于
      During sendmsg() a cloned skb is saved via dp83640_txtstamp() in
      ->tx_queue. After the NIC sends this packet, the PHY will reply with a
      timestamp for that TX packet. If the cable is pulled at the right time I
      don't see that packet. It might gets flushed as part of queue shutdown
      on NIC's side.
      Once the link is up again then after the next sendmsg() we enqueue
      another skb in dp83640_txtstamp() and have two on the list. Then the PHY
      will send a reply and decode_txts() attaches it to the first skb on the
      list.
      No crash occurs since refcounting works but we are one packet behind.
      linuxptp/ptp4l usually closes the socket and opens a new one (in such a
      timeout case) so those "stale" replies never get there. However it does
      not resume normal operation anymore.
      
      Purge old skbs in decode_txts().
      
      Fixes: cb646e2b ("ptp: Added a clock driver for the National Semiconductor PHYTER.")
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Reviewed-by: NKurt Kanzenbach <kurt@linutronix.de>
      Acked-by: NRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      53bc8d2a
  2. 04 2月, 2019 6 次提交
  3. 03 2月, 2019 1 次提交
  4. 02 2月, 2019 17 次提交
  5. 01 2月, 2019 11 次提交
    • J
      cfg80211: call disconnect_wk when AP stops · e005bd7d
      Johannes Berg 提交于
      Since we now prevent regulatory restore during STA disconnect
      if concurrent AP interfaces are active, we need to reschedule
      this check when the AP state changes. This fixes never doing
      a restore when an AP is the last interface to stop. Or to put
      it another way: we need to re-check after anything we check
      here changes.
      
      Cc: stable@vger.kernel.org
      Fixes: 113f3aaa ("cfg80211: Prevent regulatory restore during STA disconnect in concurrent interfaces")
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      e005bd7d
    • F
      mac80211: ensure that mgmt tx skbs have tailroom for encryption · 9d0f50b8
      Felix Fietkau 提交于
      Some drivers use IEEE80211_KEY_FLAG_SW_MGMT_TX to indicate that management
      frames need to be software encrypted. Since normal data packets are still
      encrypted by the hardware, crypto_tx_tailroom_needed_cnt gets decremented
      after key upload to hw. This can lead to passing skbs to ccmp_encrypt_skb,
      which don't have the necessary tailroom for software encryption.
      
      Change the code to add tailroom for encrypted management packets, even if
      crypto_tx_tailroom_needed_cnt is 0.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NFelix Fietkau <nbd@nbd.name>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      9d0f50b8
    • D
      Merge branch 'bpf-lockdep-fixes' · f01c2803
      Daniel Borkmann 提交于
      Alexei Starovoitov says:
      
      ====================
      v1->v2:
      - reworded 2nd patch. It's a real dead lock. Not a false positive
      - dropped the lockdep fix for up_read_non_owner in bpf_get_stackid
      
      In addition to preempt_disable patch for socket filters
      https://patchwork.ozlabs.org/patch/1032437/
      First patch fixes lockdep false positive in percpu_freelist
      Second patch fixes potential deadlock in bpf_prog_register
      Third patch fixes another potential deadlock in stackmap access
      from tracing bpf prog and from syscall.
      ====================
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      f01c2803
    • M
      bpf: Fix syscall's stackmap lookup potential deadlock · 7c4cd051
      Martin KaFai Lau 提交于
      The map_lookup_elem used to not acquiring spinlock
      in order to optimize the reader.
      
      It was true until commit 557c0c6e ("bpf: convert stackmap to pre-allocation")
      The syscall's map_lookup_elem(stackmap) calls bpf_stackmap_copy().
      bpf_stackmap_copy() may find the elem no longer needed after the copy is done.
      If that is the case, pcpu_freelist_push() saves this elem for reuse later.
      This push requires a spinlock.
      
      If a tracing bpf_prog got run in the middle of the syscall's
      map_lookup_elem(stackmap) and this tracing bpf_prog is calling
      bpf_get_stackid(stackmap) which also requires the same pcpu_freelist's
      spinlock, it may end up with a dead lock situation as reported by
      Eric Dumazet in https://patchwork.ozlabs.org/patch/1030266/
      
      The situation is the same as the syscall's map_update_elem() which
      needs to acquire the pcpu_freelist's spinlock and could race
      with tracing bpf_prog.  Hence, this patch fixes it by protecting
      bpf_stackmap_copy() with this_cpu_inc(bpf_prog_active)
      to prevent tracing bpf_prog from running.
      
      A later syscall's map_lookup_elem commit f1a2e44a ("bpf: add queue and stack maps")
      also acquires a spinlock and races with tracing bpf_prog similarly.
      Hence, this patch is forward looking and protects the majority
      of the map lookups.  bpf_map_offload_lookup_elem() is the exception
      since it is for network bpf_prog only (i.e. never called by tracing
      bpf_prog).
      
      Fixes: 557c0c6e ("bpf: convert stackmap to pre-allocation")
      Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      7c4cd051
    • A
      bpf: fix potential deadlock in bpf_prog_register · e16ec340
      Alexei Starovoitov 提交于
      Lockdep found a potential deadlock between cpu_hotplug_lock, bpf_event_mutex, and cpuctx_mutex:
      [   13.007000] WARNING: possible circular locking dependency detected
      [   13.007587] 5.0.0-rc3-00018-g2fa53f89-dirty #477 Not tainted
      [   13.008124] ------------------------------------------------------
      [   13.008624] test_progs/246 is trying to acquire lock:
      [   13.009030] 0000000094160d1d (tracepoints_mutex){+.+.}, at: tracepoint_probe_register_prio+0x2d/0x300
      [   13.009770]
      [   13.009770] but task is already holding lock:
      [   13.010239] 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60
      [   13.010877]
      [   13.010877] which lock already depends on the new lock.
      [   13.010877]
      [   13.011532]
      [   13.011532] the existing dependency chain (in reverse order) is:
      [   13.012129]
      [   13.012129] -> #4 (bpf_event_mutex){+.+.}:
      [   13.012582]        perf_event_query_prog_array+0x9b/0x130
      [   13.013016]        _perf_ioctl+0x3aa/0x830
      [   13.013354]        perf_ioctl+0x2e/0x50
      [   13.013668]        do_vfs_ioctl+0x8f/0x6a0
      [   13.014003]        ksys_ioctl+0x70/0x80
      [   13.014320]        __x64_sys_ioctl+0x16/0x20
      [   13.014668]        do_syscall_64+0x4a/0x180
      [   13.015007]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [   13.015469]
      [   13.015469] -> #3 (&cpuctx_mutex){+.+.}:
      [   13.015910]        perf_event_init_cpu+0x5a/0x90
      [   13.016291]        perf_event_init+0x1b2/0x1de
      [   13.016654]        start_kernel+0x2b8/0x42a
      [   13.016995]        secondary_startup_64+0xa4/0xb0
      [   13.017382]
      [   13.017382] -> #2 (pmus_lock){+.+.}:
      [   13.017794]        perf_event_init_cpu+0x21/0x90
      [   13.018172]        cpuhp_invoke_callback+0xb3/0x960
      [   13.018573]        _cpu_up+0xa7/0x140
      [   13.018871]        do_cpu_up+0xa4/0xc0
      [   13.019178]        smp_init+0xcd/0xd2
      [   13.019483]        kernel_init_freeable+0x123/0x24f
      [   13.019878]        kernel_init+0xa/0x110
      [   13.020201]        ret_from_fork+0x24/0x30
      [   13.020541]
      [   13.020541] -> #1 (cpu_hotplug_lock.rw_sem){++++}:
      [   13.021051]        static_key_slow_inc+0xe/0x20
      [   13.021424]        tracepoint_probe_register_prio+0x28c/0x300
      [   13.021891]        perf_trace_event_init+0x11f/0x250
      [   13.022297]        perf_trace_init+0x6b/0xa0
      [   13.022644]        perf_tp_event_init+0x25/0x40
      [   13.023011]        perf_try_init_event+0x6b/0x90
      [   13.023386]        perf_event_alloc+0x9a8/0xc40
      [   13.023754]        __do_sys_perf_event_open+0x1dd/0xd30
      [   13.024173]        do_syscall_64+0x4a/0x180
      [   13.024519]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [   13.024968]
      [   13.024968] -> #0 (tracepoints_mutex){+.+.}:
      [   13.025434]        __mutex_lock+0x86/0x970
      [   13.025764]        tracepoint_probe_register_prio+0x2d/0x300
      [   13.026215]        bpf_probe_register+0x40/0x60
      [   13.026584]        bpf_raw_tracepoint_open.isra.34+0xa4/0x130
      [   13.027042]        __do_sys_bpf+0x94f/0x1a90
      [   13.027389]        do_syscall_64+0x4a/0x180
      [   13.027727]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [   13.028171]
      [   13.028171] other info that might help us debug this:
      [   13.028171]
      [   13.028807] Chain exists of:
      [   13.028807]   tracepoints_mutex --> &cpuctx_mutex --> bpf_event_mutex
      [   13.028807]
      [   13.029666]  Possible unsafe locking scenario:
      [   13.029666]
      [   13.030140]        CPU0                    CPU1
      [   13.030510]        ----                    ----
      [   13.030875]   lock(bpf_event_mutex);
      [   13.031166]                                lock(&cpuctx_mutex);
      [   13.031645]                                lock(bpf_event_mutex);
      [   13.032135]   lock(tracepoints_mutex);
      [   13.032441]
      [   13.032441]  *** DEADLOCK ***
      [   13.032441]
      [   13.032911] 1 lock held by test_progs/246:
      [   13.033239]  #0: 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60
      [   13.033909]
      [   13.033909] stack backtrace:
      [   13.034258] CPU: 1 PID: 246 Comm: test_progs Not tainted 5.0.0-rc3-00018-g2fa53f89-dirty #477
      [   13.034964] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
      [   13.035657] Call Trace:
      [   13.035859]  dump_stack+0x5f/0x8b
      [   13.036130]  print_circular_bug.isra.37+0x1ce/0x1db
      [   13.036526]  __lock_acquire+0x1158/0x1350
      [   13.036852]  ? lock_acquire+0x98/0x190
      [   13.037154]  lock_acquire+0x98/0x190
      [   13.037447]  ? tracepoint_probe_register_prio+0x2d/0x300
      [   13.037876]  __mutex_lock+0x86/0x970
      [   13.038167]  ? tracepoint_probe_register_prio+0x2d/0x300
      [   13.038600]  ? tracepoint_probe_register_prio+0x2d/0x300
      [   13.039028]  ? __mutex_lock+0x86/0x970
      [   13.039337]  ? __mutex_lock+0x24a/0x970
      [   13.039649]  ? bpf_probe_register+0x1d/0x60
      [   13.039992]  ? __bpf_trace_sched_wake_idle_without_ipi+0x10/0x10
      [   13.040478]  ? tracepoint_probe_register_prio+0x2d/0x300
      [   13.040906]  tracepoint_probe_register_prio+0x2d/0x300
      [   13.041325]  bpf_probe_register+0x40/0x60
      [   13.041649]  bpf_raw_tracepoint_open.isra.34+0xa4/0x130
      [   13.042068]  ? __might_fault+0x3e/0x90
      [   13.042374]  __do_sys_bpf+0x94f/0x1a90
      [   13.042678]  do_syscall_64+0x4a/0x180
      [   13.042975]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [   13.043382] RIP: 0033:0x7f23b10a07f9
      [   13.045155] RSP: 002b:00007ffdef42fdd8 EFLAGS: 00000202 ORIG_RAX: 0000000000000141
      [   13.045759] RAX: ffffffffffffffda RBX: 00007ffdef42ff70 RCX: 00007f23b10a07f9
      [   13.046326] RDX: 0000000000000070 RSI: 00007ffdef42fe10 RDI: 0000000000000011
      [   13.046893] RBP: 00007ffdef42fdf0 R08: 0000000000000038 R09: 00007ffdef42fe10
      [   13.047462] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
      [   13.048029] R13: 0000000000000016 R14: 00007f23b1db4690 R15: 0000000000000000
      
      Since tracepoints_mutex will be taken in tracepoint_probe_register/unregister()
      there is no need to take bpf_event_mutex too.
      bpf_event_mutex is protecting modifications to prog array used in kprobe/perf bpf progs.
      bpf_raw_tracepoints don't need to take this mutex.
      
      Fixes: c4f6699d ("bpf: introduce BPF_RAW_TRACEPOINT")
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      e16ec340
    • A
      bpf: fix lockdep false positive in percpu_freelist · a89fac57
      Alexei Starovoitov 提交于
      Lockdep warns about false positive:
      [   12.492084] 00000000e6b28347 (&head->lock){+...}, at: pcpu_freelist_push+0x2a/0x40
      [   12.492696] but this lock was taken by another, HARDIRQ-safe lock in the past:
      [   12.493275]  (&rq->lock){-.-.}
      [   12.493276]
      [   12.493276]
      [   12.493276] and interrupts could create inverse lock ordering between them.
      [   12.493276]
      [   12.494435]
      [   12.494435] other info that might help us debug this:
      [   12.494979]  Possible interrupt unsafe locking scenario:
      [   12.494979]
      [   12.495518]        CPU0                    CPU1
      [   12.495879]        ----                    ----
      [   12.496243]   lock(&head->lock);
      [   12.496502]                                local_irq_disable();
      [   12.496969]                                lock(&rq->lock);
      [   12.497431]                                lock(&head->lock);
      [   12.497890]   <Interrupt>
      [   12.498104]     lock(&rq->lock);
      [   12.498368]
      [   12.498368]  *** DEADLOCK ***
      [   12.498368]
      [   12.498837] 1 lock held by dd/276:
      [   12.499110]  #0: 00000000c58cb2ee (rcu_read_lock){....}, at: trace_call_bpf+0x5e/0x240
      [   12.499747]
      [   12.499747] the shortest dependencies between 2nd lock and 1st lock:
      [   12.500389]  -> (&rq->lock){-.-.} {
      [   12.500669]     IN-HARDIRQ-W at:
      [   12.500934]                       _raw_spin_lock+0x2f/0x40
      [   12.501373]                       scheduler_tick+0x4c/0xf0
      [   12.501812]                       update_process_times+0x40/0x50
      [   12.502294]                       tick_periodic+0x27/0xb0
      [   12.502723]                       tick_handle_periodic+0x1f/0x60
      [   12.503203]                       timer_interrupt+0x11/0x20
      [   12.503651]                       __handle_irq_event_percpu+0x43/0x2c0
      [   12.504167]                       handle_irq_event_percpu+0x20/0x50
      [   12.504674]                       handle_irq_event+0x37/0x60
      [   12.505139]                       handle_level_irq+0xa7/0x120
      [   12.505601]                       handle_irq+0xa1/0x150
      [   12.506018]                       do_IRQ+0x77/0x140
      [   12.506411]                       ret_from_intr+0x0/0x1d
      [   12.506834]                       _raw_spin_unlock_irqrestore+0x53/0x60
      [   12.507362]                       __setup_irq+0x481/0x730
      [   12.507789]                       setup_irq+0x49/0x80
      [   12.508195]                       hpet_time_init+0x21/0x32
      [   12.508644]                       x86_late_time_init+0xb/0x16
      [   12.509106]                       start_kernel+0x390/0x42a
      [   12.509554]                       secondary_startup_64+0xa4/0xb0
      [   12.510034]     IN-SOFTIRQ-W at:
      [   12.510305]                       _raw_spin_lock+0x2f/0x40
      [   12.510772]                       try_to_wake_up+0x1c7/0x4e0
      [   12.511220]                       swake_up_locked+0x20/0x40
      [   12.511657]                       swake_up_one+0x1a/0x30
      [   12.512070]                       rcu_process_callbacks+0xc5/0x650
      [   12.512553]                       __do_softirq+0xe6/0x47b
      [   12.512978]                       irq_exit+0xc3/0xd0
      [   12.513372]                       smp_apic_timer_interrupt+0xa9/0x250
      [   12.513876]                       apic_timer_interrupt+0xf/0x20
      [   12.514343]                       default_idle+0x1c/0x170
      [   12.514765]                       do_idle+0x199/0x240
      [   12.515159]                       cpu_startup_entry+0x19/0x20
      [   12.515614]                       start_kernel+0x422/0x42a
      [   12.516045]                       secondary_startup_64+0xa4/0xb0
      [   12.516521]     INITIAL USE at:
      [   12.516774]                      _raw_spin_lock_irqsave+0x38/0x50
      [   12.517258]                      rq_attach_root+0x16/0xd0
      [   12.517685]                      sched_init+0x2f2/0x3eb
      [   12.518096]                      start_kernel+0x1fb/0x42a
      [   12.518525]                      secondary_startup_64+0xa4/0xb0
      [   12.518986]   }
      [   12.519132]   ... key      at: [<ffffffff82b7bc28>] __key.71384+0x0/0x8
      [   12.519649]   ... acquired at:
      [   12.519892]    pcpu_freelist_pop+0x7b/0xd0
      [   12.520221]    bpf_get_stackid+0x1d2/0x4d0
      [   12.520563]    ___bpf_prog_run+0x8b4/0x11a0
      [   12.520887]
      [   12.521008] -> (&head->lock){+...} {
      [   12.521292]    HARDIRQ-ON-W at:
      [   12.521539]                     _raw_spin_lock+0x2f/0x40
      [   12.521950]                     pcpu_freelist_push+0x2a/0x40
      [   12.522396]                     bpf_get_stackid+0x494/0x4d0
      [   12.522828]                     ___bpf_prog_run+0x8b4/0x11a0
      [   12.523296]    INITIAL USE at:
      [   12.523537]                    _raw_spin_lock+0x2f/0x40
      [   12.523944]                    pcpu_freelist_populate+0xc0/0x120
      [   12.524417]                    htab_map_alloc+0x405/0x500
      [   12.524835]                    __do_sys_bpf+0x1a3/0x1a90
      [   12.525253]                    do_syscall_64+0x4a/0x180
      [   12.525659]                    entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [   12.526167]  }
      [   12.526311]  ... key      at: [<ffffffff838f7668>] __key.13130+0x0/0x8
      [   12.526812]  ... acquired at:
      [   12.527047]    __lock_acquire+0x521/0x1350
      [   12.527371]    lock_acquire+0x98/0x190
      [   12.527680]    _raw_spin_lock+0x2f/0x40
      [   12.527994]    pcpu_freelist_push+0x2a/0x40
      [   12.528325]    bpf_get_stackid+0x494/0x4d0
      [   12.528645]    ___bpf_prog_run+0x8b4/0x11a0
      [   12.528970]
      [   12.529092]
      [   12.529092] stack backtrace:
      [   12.529444] CPU: 0 PID: 276 Comm: dd Not tainted 5.0.0-rc3-00018-g2fa53f89 #475
      [   12.530043] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
      [   12.530750] Call Trace:
      [   12.530948]  dump_stack+0x5f/0x8b
      [   12.531248]  check_usage_backwards+0x10c/0x120
      [   12.531598]  ? ___bpf_prog_run+0x8b4/0x11a0
      [   12.531935]  ? mark_lock+0x382/0x560
      [   12.532229]  mark_lock+0x382/0x560
      [   12.532496]  ? print_shortest_lock_dependencies+0x180/0x180
      [   12.532928]  __lock_acquire+0x521/0x1350
      [   12.533271]  ? find_get_entry+0x17f/0x2e0
      [   12.533586]  ? find_get_entry+0x19c/0x2e0
      [   12.533902]  ? lock_acquire+0x98/0x190
      [   12.534196]  lock_acquire+0x98/0x190
      [   12.534482]  ? pcpu_freelist_push+0x2a/0x40
      [   12.534810]  _raw_spin_lock+0x2f/0x40
      [   12.535099]  ? pcpu_freelist_push+0x2a/0x40
      [   12.535432]  pcpu_freelist_push+0x2a/0x40
      [   12.535750]  bpf_get_stackid+0x494/0x4d0
      [   12.536062]  ___bpf_prog_run+0x8b4/0x11a0
      
      It has been explained that is a false positive here:
      https://lkml.org/lkml/2018/7/25/756
      Recap:
      - stackmap uses pcpu_freelist
      - The lock in pcpu_freelist is a percpu lock
      - stackmap is only used by tracing bpf_prog
      - A tracing bpf_prog cannot be run if another bpf_prog
        has already been running (ensured by the percpu bpf_prog_active counter).
      
      Eric pointed out that this lockdep splats stops other
      legit lockdep splats in selftests/bpf/test_progs.c.
      
      Fix this by calling local_irq_save/restore for stackmap.
      
      Another false positive had also been worked around by calling
      local_irq_save in commit 89ad2fa3 ("bpf: fix lockdep splat").
      That commit added unnecessary irq_save/restore to fast path of
      bpf hash map. irqs are already disabled at that point, since htab
      is holding per bucket spin_lock with irqsave.
      
      Let's reduce overhead for htab by introducing __pcpu_freelist_push/pop
      function w/o irqsave and convert pcpu_freelist_push/pop to irqsave
      to be used elsewhere (right now only in stackmap).
      It stops lockdep false positive in stackmap with a bit of acceptable overhead.
      
      Fixes: 557c0c6e ("bpf: convert stackmap to pre-allocation")
      Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
      Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      a89fac57
    • A
      bpf: run bpf programs with preemption disabled · 6cab5e90
      Alexei Starovoitov 提交于
      Disabled preemption is necessary for proper access to per-cpu maps
      from BPF programs.
      
      But the sender side of socket filters didn't have preemption disabled:
      unix_dgram_sendmsg->sk_filter->sk_filter_trim_cap->bpf_prog_run_save_cb->BPF_PROG_RUN
      
      and a combination of af_packet with tun device didn't disable either:
      tpacket_snd->packet_direct_xmit->packet_pick_tx_queue->ndo_select_queue->
        tun_select_queue->tun_ebpf_select_queue->bpf_prog_run_clear_cb->BPF_PROG_RUN
      
      Disable preemption before executing BPF programs (both classic and extended).
      Reported-by: NJann Horn <jannh@google.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      6cab5e90
    • M
      bpf, selftests: fix handling of sparse CPU allocations · 1bb54c40
      Martynas Pumputis 提交于
      Previously, bpf_num_possible_cpus() had a bug when calculating a
      number of possible CPUs in the case of sparse CPU allocations, as
      it was considering only the first range or element of
      /sys/devices/system/cpu/possible.
      
      E.g. in the case of "0,2-3" (CPU 1 is not available), the function
      returned 1 instead of 3.
      
      This patch fixes the function by making it parse all CPU ranges and
      elements.
      Signed-off-by: NMartynas Pumputis <m@lambda.lt>
      Acked-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      1bb54c40
    • M
      bnxt_en: Disable interrupts when allocating CP rings or NQs. · 5e66e35a
      Michael Chan 提交于
      When calling firmware to allocate a CP ring or NQ, an interrupt associated
      with that ring may be generated immediately before the doorbell is even
      setup after the firmware call returns.  When servicing the interrupt, the
      driver may crash when trying to access the doorbell.
      
      Fix it by disabling interrupt on that vector until the doorbell is
      set up.
      
      Fixes: 697197e5 ("bnxt_en: Re-structure doorbells.")
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e66e35a
    • D
      Merge branch 'ieee802154-for-davem-2019-01-31' of... · da0e5171
      David S. Miller 提交于
      Merge branch 'ieee802154-for-davem-2019-01-31' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan
      
      Stefan Schmidt says:
      
      ====================
      pull-request: ieee802154 for net 2019-01-31
      
      An update from ieee802154 for your *net* tree.
      
      I waited a while to see if anything else comes up, but it seems this time
      we only have one fixup patch for the -rc rounds.
      Colin fixed some indentation in the mcr20a drivers. That's about it.
      
      If there are any problems with taking these two before the final 5.0 let
      me know.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      da0e5171
    • E
      rds: fix refcount bug in rds_sock_addref · 6fa19f56
      Eric Dumazet 提交于
      syzbot was able to catch a bug in rds [1]
      
      The issue here is that the socket might be found in a hash table
      but that its refcount has already be set to 0 by another cpu.
      
      We need to use refcount_inc_not_zero() to be safe here.
      
      [1]
      
      refcount_t: increment on 0; use-after-free.
      WARNING: CPU: 1 PID: 23129 at lib/refcount.c:153 refcount_inc_checked lib/refcount.c:153 [inline]
      WARNING: CPU: 1 PID: 23129 at lib/refcount.c:153 refcount_inc_checked+0x61/0x70 lib/refcount.c:151
      Kernel panic - not syncing: panic_on_warn set ...
      CPU: 1 PID: 23129 Comm: syz-executor3 Not tainted 5.0.0-rc4+ #53
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1db/0x2d0 lib/dump_stack.c:113
       panic+0x2cb/0x65c kernel/panic.c:214
       __warn.cold+0x20/0x48 kernel/panic.c:571
       report_bug+0x263/0x2b0 lib/bug.c:186
       fixup_bug arch/x86/kernel/traps.c:178 [inline]
       fixup_bug arch/x86/kernel/traps.c:173 [inline]
       do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
       do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:290
       invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
      RIP: 0010:refcount_inc_checked lib/refcount.c:153 [inline]
      RIP: 0010:refcount_inc_checked+0x61/0x70 lib/refcount.c:151
      Code: 1d 51 63 c8 06 31 ff 89 de e8 eb 1b f2 fd 84 db 75 dd e8 a2 1a f2 fd 48 c7 c7 60 9f 81 88 c6 05 31 63 c8 06 01 e8 af 65 bb fd <0f> 0b eb c1 90 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 54 49
      RSP: 0018:ffff8880a0cbf1e8 EFLAGS: 00010282
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc90006113000
      RDX: 000000000001047d RSI: ffffffff81685776 RDI: 0000000000000005
      RBP: ffff8880a0cbf1f8 R08: ffff888097c9e100 R09: ffffed1015ce5021
      R10: ffffed1015ce5020 R11: ffff8880ae728107 R12: ffff8880723c20c0
      R13: ffff8880723c24b0 R14: dffffc0000000000 R15: ffffed1014197e64
       sock_hold include/net/sock.h:647 [inline]
       rds_sock_addref+0x19/0x20 net/rds/af_rds.c:675
       rds_find_bound+0x97c/0x1080 net/rds/bind.c:82
       rds_recv_incoming+0x3be/0x1430 net/rds/recv.c:362
       rds_loop_xmit+0xf3/0x2a0 net/rds/loop.c:96
       rds_send_xmit+0x1355/0x2a10 net/rds/send.c:355
       rds_sendmsg+0x323c/0x44e0 net/rds/send.c:1368
       sock_sendmsg_nosec net/socket.c:621 [inline]
       sock_sendmsg+0xdd/0x130 net/socket.c:631
       __sys_sendto+0x387/0x5f0 net/socket.c:1788
       __do_sys_sendto net/socket.c:1800 [inline]
       __se_sys_sendto net/socket.c:1796 [inline]
       __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1796
       do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x458089
      Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fc266df8c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 0000000000458089
      RDX: 0000000000000000 RSI: 00000000204b3fff RDI: 0000000000000005
      RBP: 000000000073bf00 R08: 00000000202b4000 R09: 0000000000000010
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007fc266df96d4
      R13: 00000000004c56e4 R14: 00000000004d94a8 R15: 00000000ffffffff
      
      Fixes: cc4dfb7f ("rds: fix two RCU related problems")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Sowmini Varadhan <sowmini.varadhan@oracle.com>
      Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
      Cc: rds-devel@oss.oracle.com
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6fa19f56