1. 03 7月, 2021 18 次提交
  2. 15 6月, 2021 17 次提交
    • P
      net: caif: add proper error handling · b31be42a
      Pavel Skripkin 提交于
      stable inclusion
      from stable-5.10.43
      commit d6db727457dd29938524f04b301c83ac67cccb87
      bugzilla: 109284
      CVE: NA
      
      --------------------------------
      
      commit a2805dca upstream.
      
      caif_enroll_dev() can fail in some cases. Ingnoring
      these cases can lead to memory leak due to not assigning
      link_support pointer to anywhere.
      
      Fixes: 7c18d220 ("caif: Restructure how link caif link layer enroll")
      Cc: stable@vger.kernel.org
      Signed-off-by: NPavel Skripkin <paskripkin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      b31be42a
    • P
      net: caif: added cfserl_release function · 4c29ce11
      Pavel Skripkin 提交于
      stable inclusion
      from stable-5.10.43
      commit dac53568c6ac4c4678e6c2f4e66314b99ec85f03
      bugzilla: 109284
      CVE: NA
      
      --------------------------------
      
      commit bce130e7 upstream.
      
      Added cfserl_release() function.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NPavel Skripkin <paskripkin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      4c29ce11
    • T
      bus: ti-sysc: Fix am335x resume hang for usb otg module · 9f181b28
      Tony Lindgren 提交于
      stable inclusion
      from stable-5.10.43
      commit d551b8e857775a6ea48f365d9611fe5c470008a3
      bugzilla: 109284
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 4d7b324e ]
      
      On am335x, suspend and resume only works once, and the system hangs if
      suspend is attempted again. However, turns out suspend and resume works
      fine multiple times if the USB OTG driver for musb controller is loaded.
      
      The issue is caused my the interconnect target module losing context
      during suspend, and it needs a restore on resume to be reconfigure again
      as debugged earlier by Dave Gerlach <d-gerlach@ti.com>.
      
      There are also other modules that need a restore on resume, like gpmc as
      noted by Dave. So let's add a common way to restore an interconnect
      target module based on a quirk flag. For now, let's enable the quirk for
      am335x otg only to fix the suspend and resume issue.
      
      As gpmc is not causing hangs based on tests with BeagleBone, let's patch
      gpmc separately. For gpmc, we also need a hardware reset done before
      restore according to Dave.
      
      To reinit the modules, we decouple system suspend from PM runtime. We
      replace calls to pm_runtime_force_suspend() and pm_runtime_force_resume()
      with direct calls to internal functions and rely on the driver internal
      state. There no point trying to handle complex system suspend and resume
      quirks via PM runtime.
      
      This is issue should have already been noticed with commit 1819ef2e
      ("bus: ti-sysc: Use swsup quirks also for am335x musb") when quirk
      handling was added for am335x otg for swsup. But the issue went unnoticed
      as having musb driver loaded hides the issue, and suspend and resume works
      once without the driver loaded.
      
      Fixes: 1819ef2e ("bus: ti-sysc: Use swsup quirks also for am335x musb")
      Suggested-by: NDave Gerlach <d-gerlach@ti.com>
      Signed-off-by: NTony Lindgren <tony@atomide.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      9f181b28
    • Y
      net/mlx5: DR, Create multi-destination flow table with level less than 64 · c912d5a8
      Yevgeny Kliteynik 提交于
      stable inclusion
      from stable-5.10.43
      commit 2a8cda3867cd06fbc3f414a78e1c692f973d21e4
      bugzilla: 109284
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 216214c6 ]
      
      Flow table that contains flow pointing to multiple flow tables or multiple
      TIRs must have a level lower than 64. In our case it applies to muli-
      destination flow table.
      Fix the level of the created table to comply with HW Spec definitions, and
      still make sure that its level lower than SW-owned tables, so that it
      would be possible to point from the multi-destination FW table to SW
      tables.
      
      Fixes: 34583bee ("net/mlx5: DR, Create multi-destination table for SW-steering use")
      Signed-off-by: NYevgeny Kliteynik <kliteyn@nvidia.com>
      Reviewed-by: NAlex Vesker <valex@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      c912d5a8
    • M
      net/tls: Fix use-after-free after the TLS device goes down and up · aa3905c0
      Maxim Mikityanskiy 提交于
      stable inclusion
      from stable-5.10.43
      commit f1d4184f128dede82a59a841658ed40d4e6d3aa2
      bugzilla: 109284
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit c55dcdd4 ]
      
      When a netdev with active TLS offload goes down, tls_device_down is
      called to stop the offload and tear down the TLS context. However, the
      socket stays alive, and it still points to the TLS context, which is now
      deallocated. If a netdev goes up, while the connection is still active,
      and the data flow resumes after a number of TCP retransmissions, it will
      lead to a use-after-free of the TLS context.
      
      This commit addresses this bug by keeping the context alive until its
      normal destruction, and implements the necessary fallbacks, so that the
      connection can resume in software (non-offloaded) kTLS mode.
      
      On the TX side tls_sw_fallback is used to encrypt all packets. The RX
      side already has all the necessary fallbacks, because receiving
      non-decrypted packets is supported. The thing needed on the RX side is
      to block resync requests, which are normally produced after receiving
      non-decrypted packets.
      
      The necessary synchronization is implemented for a graceful teardown:
      first the fallbacks are deployed, then the driver resources are released
      (it used to be possible to have a tls_dev_resync after tls_dev_del).
      
      A new flag called TLS_RX_DEV_DEGRADED is added to indicate the fallback
      mode. It's used to skip the RX resync logic completely, as it becomes
      useless, and some objects may be released (for example, resync_async,
      which is allocated and freed by the driver).
      
      Fixes: e8f69799 ("net/tls: Add generic NIC offload infrastructure")
      Signed-off-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
      Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      aa3905c0
    • M
      net/tls: Replace TLS_RX_SYNC_RUNNING with RCU · ea55ff3c
      Maxim Mikityanskiy 提交于
      stable inclusion
      from stable-5.10.43
      commit 874ece252ed269f5ac1f55167a3f2735ab0f249f
      bugzilla: 109284
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 05fc8b6c ]
      
      RCU synchronization is guaranteed to finish in finite time, unlike a
      busy loop that polls a flag. This patch is a preparation for the bugfix
      in the next patch, where the same synchronize_net() call will also be
      used to sync with the TX datapath.
      Signed-off-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
      Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      ea55ff3c
    • G
      net: usb: cdc_ncm: don't spew notifications · 5146c41f
      Grant Grundler 提交于
      stable inclusion
      from stable-5.10.43
      commit 70df000fb8808e2ea63e6beb2ff570c083592614
      bugzilla: 109284
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit de658a19 ]
      
      RTL8156 sends notifications about every 32ms.
      Only display/log notifications when something changes.
      
      This issue has been reported by others:
      	https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832472
      	https://lkml.org/lkml/2020/8/27/1083
      
      ...
      [785962.779840] usb 1-1: new high-speed USB device number 5 using xhci_hcd
      [785962.929944] usb 1-1: New USB device found, idVendor=0bda, idProduct=8156, bcdDevice=30.00
      [785962.929949] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=6
      [785962.929952] usb 1-1: Product: USB 10/100/1G/2.5G LAN
      [785962.929954] usb 1-1: Manufacturer: Realtek
      [785962.929956] usb 1-1: SerialNumber: 000000001
      [785962.991755] usbcore: registered new interface driver cdc_ether
      [785963.017068] cdc_ncm 1-1:2.0: MAC-Address: 00:24:27:88:08:15
      [785963.017072] cdc_ncm 1-1:2.0: setting rx_max = 16384
      [785963.017169] cdc_ncm 1-1:2.0: setting tx_max = 16384
      [785963.017682] cdc_ncm 1-1:2.0 usb0: register 'cdc_ncm' at usb-0000:00:14.0-1, CDC NCM, 00:24:27:88:08:15
      [785963.019211] usbcore: registered new interface driver cdc_ncm
      [785963.023856] usbcore: registered new interface driver cdc_wdm
      [785963.025461] usbcore: registered new interface driver cdc_mbim
      [785963.038824] cdc_ncm 1-1:2.0 enx002427880815: renamed from usb0
      [785963.089586] cdc_ncm 1-1:2.0 enx002427880815: network connection: disconnected
      [785963.121673] cdc_ncm 1-1:2.0 enx002427880815: network connection: disconnected
      [785963.153682] cdc_ncm 1-1:2.0 enx002427880815: network connection: disconnected
      ...
      
      This is about 2KB per second and will overwrite all contents of a 1MB
      dmesg buffer in under 10 minutes rendering them useless for debugging
      many kernel problems.
      
      This is also an extra 180 MB/day in /var/logs (or 1GB per week) rendering
      the majority of those logs useless too.
      
      When the link is up (expected state), spew amount is >2x higher:
      ...
      [786139.600992] cdc_ncm 2-1:2.0 enx002427880815: network connection: connected
      [786139.632997] cdc_ncm 2-1:2.0 enx002427880815: 2500 mbit/s downlink 2500 mbit/s uplink
      [786139.665097] cdc_ncm 2-1:2.0 enx002427880815: network connection: connected
      [786139.697100] cdc_ncm 2-1:2.0 enx002427880815: 2500 mbit/s downlink 2500 mbit/s uplink
      [786139.729094] cdc_ncm 2-1:2.0 enx002427880815: network connection: connected
      [786139.761108] cdc_ncm 2-1:2.0 enx002427880815: 2500 mbit/s downlink 2500 mbit/s uplink
      ...
      
      Chrome OS cannot support RTL8156 until this is fixed.
      Signed-off-by: NGrant Grundler <grundler@chromium.org>
      Reviewed-by: NHayes Wang <hayeswang@realtek.com>
      Link: https://lore.kernel.org/r/20210120011208.3768105-1-grundler@chromium.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      5146c41f
    • T
      SUNRPC: More fixes for backlog congestion · b73c5591
      Trond Myklebust 提交于
      stable inclusion
      from stable-5.10.42
      commit 899b5131e74cd09ddd204addf763deaf6870ab57
      bugzilla: 55093
      CVE: NA
      
      --------------------------------
      
      commit e86be3a0 upstream.
      
      Ensure that we fix the XPRT_CONGESTED starvation issue for RDMA as well
      as socket based transports.
      Ensure we always initialise the request after waking up from the backlog
      list.
      
      Fixes: e877a88d ("SUNRPC in case of backlog, hand free slots directly to waiting task")
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      b73c5591
    • V
      net: zero-initialize tc skb extension on allocation · 97bc331b
      Vlad Buslov 提交于
      stable inclusion
      from stable-5.10.42
      commit ac493452e937b8939eaf2d24cac51a4804b6c20e
      bugzilla: 55093
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 9453d45e ]
      
      Function skb_ext_add() doesn't initialize created skb extension with any
      value and leaves it up to the user. However, since extension of type
      TC_SKB_EXT originally contained only single value tc_skb_ext->chain its
      users used to just assign the chain value without setting whole extension
      memory to zero first. This assumption changed when TC_SKB_EXT extension was
      extended with additional fields but not all users were updated to
      initialize the new fields which leads to use of uninitialized memory
      afterwards. UBSAN log:
      
      [  778.299821] UBSAN: invalid-load in net/openvswitch/flow.c:899:28
      [  778.301495] load of value 107 is not a valid value for type '_Bool'
      [  778.303215] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.12.0-rc7+ #2
      [  778.304933] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      [  778.307901] Call Trace:
      [  778.308680]  <IRQ>
      [  778.309358]  dump_stack+0xbb/0x107
      [  778.310307]  ubsan_epilogue+0x5/0x40
      [  778.311167]  __ubsan_handle_load_invalid_value.cold+0x43/0x48
      [  778.312454]  ? memset+0x20/0x40
      [  778.313230]  ovs_flow_key_extract.cold+0xf/0x14 [openvswitch]
      [  778.314532]  ovs_vport_receive+0x19e/0x2e0 [openvswitch]
      [  778.315749]  ? ovs_vport_find_upcall_portid+0x330/0x330 [openvswitch]
      [  778.317188]  ? create_prof_cpu_mask+0x20/0x20
      [  778.318220]  ? arch_stack_walk+0x82/0xf0
      [  778.319153]  ? secondary_startup_64_no_verify+0xb0/0xbb
      [  778.320399]  ? stack_trace_save+0x91/0xc0
      [  778.321362]  ? stack_trace_consume_entry+0x160/0x160
      [  778.322517]  ? lock_release+0x52e/0x760
      [  778.323444]  netdev_frame_hook+0x323/0x610 [openvswitch]
      [  778.324668]  ? ovs_netdev_get_vport+0xe0/0xe0 [openvswitch]
      [  778.325950]  __netif_receive_skb_core+0x771/0x2db0
      [  778.327067]  ? lock_downgrade+0x6e0/0x6f0
      [  778.328021]  ? lock_acquire+0x565/0x720
      [  778.328940]  ? generic_xdp_tx+0x4f0/0x4f0
      [  778.329902]  ? inet_gro_receive+0x2a7/0x10a0
      [  778.330914]  ? lock_downgrade+0x6f0/0x6f0
      [  778.331867]  ? udp4_gro_receive+0x4c4/0x13e0
      [  778.332876]  ? lock_release+0x52e/0x760
      [  778.333808]  ? dev_gro_receive+0xcc8/0x2380
      [  778.334810]  ? lock_downgrade+0x6f0/0x6f0
      [  778.335769]  __netif_receive_skb_list_core+0x295/0x820
      [  778.336955]  ? process_backlog+0x780/0x780
      [  778.337941]  ? mlx5e_rep_tc_netdevice_event_unregister+0x20/0x20 [mlx5_core]
      [  778.339613]  ? seqcount_lockdep_reader_access.constprop.0+0xa7/0xc0
      [  778.341033]  ? kvm_clock_get_cycles+0x14/0x20
      [  778.342072]  netif_receive_skb_list_internal+0x5f5/0xcb0
      [  778.343288]  ? __kasan_kmalloc+0x7a/0x90
      [  778.344234]  ? mlx5e_handle_rx_cqe_mpwrq+0x9e0/0x9e0 [mlx5_core]
      [  778.345676]  ? mlx5e_xmit_xdp_frame_mpwqe+0x14d0/0x14d0 [mlx5_core]
      [  778.347140]  ? __netif_receive_skb_list_core+0x820/0x820
      [  778.348351]  ? mlx5e_post_rx_mpwqes+0xa6/0x25d0 [mlx5_core]
      [  778.349688]  ? napi_gro_flush+0x26c/0x3c0
      [  778.350641]  napi_complete_done+0x188/0x6b0
      [  778.351627]  mlx5e_napi_poll+0x373/0x1b80 [mlx5_core]
      [  778.352853]  __napi_poll+0x9f/0x510
      [  778.353704]  ? mlx5_flow_namespace_set_mode+0x260/0x260 [mlx5_core]
      [  778.355158]  net_rx_action+0x34c/0xa40
      [  778.356060]  ? napi_threaded_poll+0x3d0/0x3d0
      [  778.357083]  ? sched_clock_cpu+0x18/0x190
      [  778.358041]  ? __common_interrupt+0x8e/0x1a0
      [  778.359045]  __do_softirq+0x1ce/0x984
      [  778.359938]  __irq_exit_rcu+0x137/0x1d0
      [  778.360865]  irq_exit_rcu+0xa/0x20
      [  778.361708]  common_interrupt+0x80/0xa0
      [  778.362640]  </IRQ>
      [  778.363212]  asm_common_interrupt+0x1e/0x40
      [  778.364204] RIP: 0010:native_safe_halt+0xe/0x10
      [  778.365273] Code: 4f ff ff ff 4c 89 e7 e8 50 3f 40 fe e9 dc fe ff ff 48 89 df e8 43 3f 40 fe eb 90 cc e9 07 00 00 00 0f 00 2d 74 05 62 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d 64 05 62 00 f4 c3 cc cc 0f 1f 44 00
      [  778.369355] RSP: 0018:ffffffff84407e48 EFLAGS: 00000246
      [  778.370570] RAX: ffff88842de46a80 RBX: ffffffff84425840 RCX: ffffffff83418468
      [  778.372143] RDX: 000000000026f1da RSI: 0000000000000004 RDI: ffffffff8343af5e
      [  778.373722] RBP: fffffbfff0884b08 R08: 0000000000000000 R09: ffff88842de46bcb
      [  778.375292] R10: ffffed1085bc8d79 R11: 0000000000000001 R12: 0000000000000000
      [  778.376860] R13: ffffffff851124a0 R14: 0000000000000000 R15: dffffc0000000000
      [  778.378491]  ? rcu_eqs_enter.constprop.0+0xb8/0xe0
      [  778.379606]  ? default_idle_call+0x5e/0xe0
      [  778.380578]  default_idle+0xa/0x10
      [  778.381406]  default_idle_call+0x96/0xe0
      [  778.382350]  do_idle+0x3d4/0x550
      [  778.383153]  ? arch_cpu_idle_exit+0x40/0x40
      [  778.384143]  cpu_startup_entry+0x19/0x20
      [  778.385078]  start_kernel+0x3c7/0x3e5
      [  778.385978]  secondary_startup_64_no_verify+0xb0/0xbb
      
      Fix the issue by providing new function tc_skb_ext_alloc() that allocates
      tc skb extension and initializes its memory to 0 before returning it to the
      caller. Change all existing users to use new API instead of calling
      skb_ext_add() directly.
      
      Fixes: 038ebb1a ("net/sched: act_ct: fix miss set mru for ovs after defrag in act_ct")
      Fixes: d29334c1 ("net/sched: act_api: fix miss set post_ct for ovs after do conntrack in act_ct")
      Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
      Acked-by: NCong Wang <cong.wang@bytedance.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      97bc331b
    • Y
      net: sched: fix tx action rescheduling issue during deactivation · 006771a8
      Yunsheng Lin 提交于
      stable inclusion
      from stable-5.10.42
      commit 2f23d5bcd9f89c239da83abd6270f5f0d9dd95bc
      bugzilla: 55093
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 102b55ee ]
      
      Currently qdisc_run() checks the STATE_DEACTIVATED of lockless
      qdisc before calling __qdisc_run(), which ultimately clear the
      STATE_MISSED when all the skb is dequeued. If STATE_DEACTIVATED
      is set before clearing STATE_MISSED, there may be rescheduling
      of net_tx_action() at the end of qdisc_run_end(), see below:
      
      CPU0(net_tx_atcion)  CPU1(__dev_xmit_skb)  CPU2(dev_deactivate)
                .                   .                     .
                .            set STATE_MISSED             .
                .           __netif_schedule()            .
                .                   .           set STATE_DEACTIVATED
                .                   .                qdisc_reset()
                .                   .                     .
                .<---------------   .              synchronize_net()
      clear __QDISC_STATE_SCHED  |  .                     .
                .                |  .                     .
                .                |  .            some_qdisc_is_busy()
                .                |  .               return *false*
                .                |  .                     .
        test STATE_DEACTIVATED   |  .                     .
      __qdisc_run() *not* called |  .                     .
                .                |  .                     .
         test STATE_MISS         |  .                     .
       __netif_schedule()--------|  .                     .
                .                   .                     .
                .                   .                     .
      
      __qdisc_run() is not called by net_tx_atcion() in CPU0 because
      CPU2 has set STATE_DEACTIVATED flag during dev_deactivate(), and
      STATE_MISSED is only cleared in __qdisc_run(), __netif_schedule
      is called at the end of qdisc_run_end(), causing tx action
      rescheduling problem.
      
      qdisc_run() called by net_tx_action() runs in the softirq context,
      which should has the same semantic as the qdisc_run() called by
      __dev_xmit_skb() protected by rcu_read_lock_bh(). And there is a
      synchronize_net() between STATE_DEACTIVATED flag being set and
      qdisc_reset()/some_qdisc_is_busy in dev_deactivate(), we can safely
      bail out for the deactived lockless qdisc in net_tx_action(), and
      qdisc_reset() will reset all skb not dequeued yet.
      
      So add the rcu_read_lock() explicitly to protect the qdisc_run()
      and do the STATE_DEACTIVATED checking in net_tx_action() before
      calling qdisc_run_begin(). Another option is to do the checking in
      the qdisc_run_end(), but it will add unnecessary overhead for
      non-tx_action case, because __dev_queue_xmit() will not see qdisc
      with STATE_DEACTIVATED after synchronize_net(), the qdisc with
      STATE_DEACTIVATED can only be seen by net_tx_action() because of
      __netif_schedule().
      
      The STATE_DEACTIVATED checking in qdisc_run() is to avoid race
      between net_tx_action() and qdisc_reset(), see:
      commit d518d2ed ("net/sched: fix race between deactivation
      and dequeue for NOLOCK qdisc"). As the bailout added above for
      deactived lockless qdisc in net_tx_action() provides better
      protection for the race without calling qdisc_run() at all, so
      remove the STATE_DEACTIVATED checking in qdisc_run().
      
      After qdisc_reset(), there is no skb in qdisc to be dequeued, so
      clear the STATE_MISSED in dev_reset_queue() too.
      
      Fixes: 6b3ba914 ("net: sched: allow qdiscs to handle locking")
      Acked-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
      V8: Clearing STATE_MISSED before calling __netif_schedule() has
          avoid the endless rescheduling problem, but there may still
          be a unnecessary rescheduling, so adjust the commit log.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      006771a8
    • Y
      net: sched: fix packet stuck problem for lockless qdisc · e3ee623d
      Yunsheng Lin 提交于
      stable inclusion
      from stable-5.10.42
      commit 21c71510925308a1d81513e1519165c063d1b57c
      bugzilla: 55093
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit a90c57f2 ]
      
      Lockless qdisc has below concurrent problem:
          cpu0                 cpu1
           .                     .
      q->enqueue                 .
           .                     .
      qdisc_run_begin()          .
           .                     .
      dequeue_skb()              .
           .                     .
      sch_direct_xmit()          .
           .                     .
           .                q->enqueue
           .             qdisc_run_begin()
           .            return and do nothing
           .                     .
      qdisc_run_end()            .
      
      cpu1 enqueue a skb without calling __qdisc_run() because cpu0
      has not released the lock yet and spin_trylock() return false
      for cpu1 in qdisc_run_begin(), and cpu0 do not see the skb
      enqueued by cpu1 when calling dequeue_skb() because cpu1 may
      enqueue the skb after cpu0 calling dequeue_skb() and before
      cpu0 calling qdisc_run_end().
      
      Lockless qdisc has below another concurrent problem when
      tx_action is involved:
      
      cpu0(serving tx_action)     cpu1             cpu2
                .                   .                .
                .              q->enqueue            .
                .            qdisc_run_begin()       .
                .              dequeue_skb()         .
                .                   .            q->enqueue
                .                   .                .
                .             sch_direct_xmit()      .
                .                   .         qdisc_run_begin()
                .                   .       return and do nothing
                .                   .                .
       clear __QDISC_STATE_SCHED    .                .
       qdisc_run_begin()            .                .
       return and do nothing        .                .
                .                   .                .
                .            qdisc_run_end()         .
      
      This patch fixes the above data race by:
      1. If the first spin_trylock() return false and STATE_MISSED is
         not set, set STATE_MISSED and retry another spin_trylock() in
         case other CPU may not see STATE_MISSED after it releases the
         lock.
      2. reschedule if STATE_MISSED is set after the lock is released
         at the end of qdisc_run_end().
      
      For tx_action case, STATE_MISSED is also set when cpu1 is at the
      end if qdisc_run_end(), so tx_action will be rescheduled again
      to dequeue the skb enqueued by cpu2.
      
      Clear STATE_MISSED before retrying a dequeuing when dequeuing
      returns NULL in order to reduce the overhead of the second
      spin_trylock() and __netif_schedule() calling.
      
      Also clear the STATE_MISSED before calling __netif_schedule()
      at the end of qdisc_run_end() to avoid doing another round of
      dequeuing in the pfifo_fast_dequeue().
      
      The performance impact of this patch, tested using pktgen and
      dummy netdev with pfifo_fast qdisc attached:
      
       threads  without+this_patch   with+this_patch      delta
          1        2.61Mpps            2.60Mpps           -0.3%
          2        3.97Mpps            3.82Mpps           -3.7%
          4        5.62Mpps            5.59Mpps           -0.5%
          8        2.78Mpps            2.77Mpps           -0.3%
         16        2.22Mpps            2.22Mpps           -0.0%
      
      Fixes: 6b3ba914 ("net: sched: allow qdiscs to handle locking")
      Acked-by: NJakub Kicinski <kuba@kernel.org>
      Tested-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      e3ee623d
    • P
      net: really orphan skbs tied to closing sk · 62127e1d
      Paolo Abeni 提交于
      stable inclusion
      from stable-5.10.42
      commit 1f1b431a4fcd96a6be85ab5a61bd874960d182cf
      bugzilla: 55093
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 098116e7 ]
      
      If the owing socket is shutting down - e.g. the sock reference
      count already dropped to 0 and only sk_wmem_alloc is keeping
      the sock alive, skb_orphan_partial() becomes a no-op.
      
      When forwarding packets over veth with GRO enabled, the above
      causes refcount errors.
      
      This change addresses the issue with a plain skb_orphan() call
      in the critical scenario.
      
      Fixes: 9adc89af ("net: let skb_orphan_partial wake-up waiters.")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      62127e1d
    • R
      linux/bits.h: fix compilation error with GENMASK · cffb222b
      Rikard Falkeborn 提交于
      stable inclusion
      from stable-5.10.42
      commit 1354ec840899e87259286cc844d4c161ea86fae7
      bugzilla: 55093
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit f747e666 ]
      
      GENMASK() has an input check which uses __builtin_choose_expr() to
      enable a compile time sanity check of its inputs if they are known at
      compile time.
      
      However, it turns out that __builtin_constant_p() does not always return
      a compile time constant [0].  It was thought this problem was fixed with
      gcc 4.9 [1], but apparently this is not the case [2].
      
      Switch to use __is_constexpr() instead which always returns a compile time
      constant, regardless of its inputs.
      
      Link: https://lore.kernel.org/lkml/42b4342b-aefc-a16a-0d43-9f9c0d63ba7a@rasmusvillemoes.dk [0]
      Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19449 [1]
      Link: https://lore.kernel.org/lkml/1ac7bbc2-45d9-26ed-0b33-bf382b8d858b@I-love.SAKURA.ne.jp [2]
      Link: https://lkml.kernel.org/r/20210511203716.117010-1-rikard.falkeborn@gmail.comSigned-off-by: NRikard Falkeborn <rikard.falkeborn@gmail.com>
      Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NAndy Shevchenko <andy.shevchenko@gmail.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Yury Norov <yury.norov@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      cffb222b
    • R
      netfilter: flowtable: Remove redundant hw refresh bit · f85c5227
      Roi Dayan 提交于
      stable inclusion
      from stable-5.10.42
      commit 6d6bc8c75290866e59ee25ab6bb0114eb166b980
      bugzilla: 55093
      CVE: NA
      
      --------------------------------
      
      commit c07531c0 upstream.
      
      Offloading conns could fail for multiple reasons and a hw refresh bit is
      set to try to reoffload it in next sw packet.
      But it could be in some cases and future points that the hw refresh bit
      is not set but a refresh could succeed.
      Remove the hw refresh bit and do offload refresh if requested.
      There won't be a new work entry if a work is already pending
      anyway as there is the hw pending bit.
      
      Fixes: 8b3646d6 ("net/sched: act_ct: Support refreshing the flow table entries")
      Signed-off-by: NRoi Dayan <roid@nvidia.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      f85c5227
    • E
      {net,vdpa}/mlx5: Configure interface MAC into mpfs L2 table · b4bb871d
      Eli Cohen 提交于
      stable inclusion
      from stable-5.10.42
      commit 89a0e388c6f2ea73434727ca4780150874e225a7
      bugzilla: 55093
      CVE: NA
      
      --------------------------------
      
      commit 7c9f131f upstream.
      
      net/mlx5: Expose MPFS configuration API
      
      MPFS is the multi physical function switch that bridges traffic between
      the physical port and any physical functions associated with it. The
      driver is required to add or remove MAC entries to properly forward
      incoming traffic to the correct physical function.
      
      We export the API to control MPFS so that other drivers, such as
      mlx5_vdpa are able to add MAC addresses of their network interfaces.
      
      The MAC address of the vdpa interface must be configured into the MPFS L2
      address. Failing to do so could cause, in some NIC configurations, failure
      to forward packets to the vdpa network device instance.
      
      Fix this by adding calls to update the MPFS table.
      
      CC: <mst@redhat.com>
      CC: <jasowang@redhat.com>
      CC: <virtualization@lists.linux-foundation.org>
      Fixes: 1a86b377 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
      Signed-off-by: NEli Cohen <elic@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      b4bb871d
    • R
      drivers: base: Fix device link removal · b9dbe96f
      Rafael J. Wysocki 提交于
      stable inclusion
      from stable-5.10.42
      commit d007150b4e15bfcb8d36cfd88a5645d42e44d383
      bugzilla: 55093
      CVE: NA
      
      --------------------------------
      
      commit 80dd33cf upstream.
      
      When device_link_free() drops references to the supplier and
      consumer devices of the device link going away and the reference
      being dropped turns out to be the last one for any of those
      device objects, its ->release callback will be invoked and it
      may sleep which goes against the SRCU callback execution
      requirements.
      
      To address this issue, make the device link removal code carry out
      the device_link_free() actions preceded by SRCU synchronization from
      a separate work item (the "long" workqueue is used for that, because
      it does not matter when the device link memory is released and it may
      take time to get to that point) instead of using SRCU callbacks.
      
      While at it, make the code work analogously when SRCU is not enabled
      to reduce the differences between the SRCU and non-SRCU cases.
      
      Fixes: 843e600b ("driver core: Fix sleeping in invalid context during device link deletion")
      Cc: stable <stable@vger.kernel.org>
      Reported-by: Nchenxiang (M) <chenxiang66@hisilicon.com>
      Tested-by: Nchenxiang (M) <chenxiang66@hisilicon.com>
      Reviewed-by: NSaravana Kannan <saravanak@google.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Link: https://lore.kernel.org/r/5722787.lOV4Wx5bFT@kreacherSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      b9dbe96f
    • M
      mac80211: properly handle A-MSDUs that start with an RFC 1042 header · 79a5f094
      Mathy Vanhoef 提交于
      stable inclusion
      from stable-5.10.42
      commit e3561d5af01c2c49ed52c8d2644be752d5b13ec2
      bugzilla: 55093
      CVE: NA
      
      --------------------------------
      
      commit a1d5ff56 upstream.
      
      Properly parse A-MSDUs whose first 6 bytes happen to equal a rfc1042
      header. This can occur in practice when the destination MAC address
      equals AA:AA:03:00:00:00. More importantly, this simplifies the next
      patch to mitigate A-MSDU injection attacks.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMathy Vanhoef <Mathy.Vanhoef@kuleuven.be>
      Link: https://lore.kernel.org/r/20210511200110.0b2b886492f0.I23dd5d685fe16d3b0ec8106e8f01b59f499dffed@changeidSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      79a5f094
  3. 04 6月, 2021 5 次提交