1. 23 7月, 2021 3 次提交
  2. 22 7月, 2021 5 次提交
  3. 21 7月, 2021 2 次提交
    • V
      udp: check encap socket in __udp_lib_err · 9bfce73c
      Vadim Fedorenko 提交于
      Commit d26796ae ("udp: check udp sock encap_type in __udp_lib_err")
      added checks for encapsulated sockets but it broke cases when there is
      no implementation of encap_err_lookup for encapsulation, i.e. ESP in
      UDP encapsulation. Fix it by calling encap_err_lookup only if socket
      implements this method otherwise treat it as legal socket.
      
      Fixes: d26796ae ("udp: check udp sock encap_type in __udp_lib_err")
      Signed-off-by: NVadim Fedorenko <vfedorenko@novek.ru>
      Reviewed-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9bfce73c
    • X
      sctp: update active_key for asoc when old key is being replaced · 58acd100
      Xin Long 提交于
      syzbot reported a call trace:
      
        BUG: KASAN: use-after-free in sctp_auth_shkey_hold+0x22/0xa0 net/sctp/auth.c:112
        Call Trace:
         sctp_auth_shkey_hold+0x22/0xa0 net/sctp/auth.c:112
         sctp_set_owner_w net/sctp/socket.c:131 [inline]
         sctp_sendmsg_to_asoc+0x152e/0x2180 net/sctp/socket.c:1865
         sctp_sendmsg+0x103b/0x1d30 net/sctp/socket.c:2027
         inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:821
         sock_sendmsg_nosec net/socket.c:703 [inline]
         sock_sendmsg+0xcf/0x120 net/socket.c:723
      
      This is an use-after-free issue caused by not updating asoc->shkey after
      it was replaced in the key list asoc->endpoint_shared_keys, and the old
      key was freed.
      
      This patch is to fix by also updating active_key for asoc when old key is
      being replaced with a new one. Note that this issue doesn't exist in
      sctp_auth_del_key_id(), as it's not allowed to delete the active_key
      from the asoc.
      
      Fixes: 1b1e0bc9 ("sctp: add refcnt support for sh_key")
      Reported-by: syzbot+b774577370208727d12b@syzkaller.appspotmail.com
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58acd100
  4. 20 7月, 2021 5 次提交
    • P
      ipv6: fix another slab-out-of-bounds in fib6_nh_flush_exceptions · 8fb4792f
      Paolo Abeni 提交于
      While running the self-tests on a KASAN enabled kernel, I observed a
      slab-out-of-bounds splat very similar to the one reported in
      commit 821bbf79 ("ipv6: Fix KASAN: slab-out-of-bounds Read in
       fib6_nh_flush_exceptions").
      
      We additionally need to take care of fib6_metrics initialization
      failure when the caller provides an nh.
      
      The fix is similar, explicitly free the route instead of calling
      fib6_info_release on a half-initialized object.
      
      Fixes: f88d8ea6 ("ipv6: Plumb support for nexthop object in a fib6_info")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8fb4792f
    • P
      net/sched: act_skbmod: Skip non-Ethernet packets · 727d6a8b
      Peilin Ye 提交于
      Currently tcf_skbmod_act() assumes that packets use Ethernet as their L2
      protocol, which is not always the case.  As an example, for CAN devices:
      
      	$ ip link add dev vcan0 type vcan
      	$ ip link set up vcan0
      	$ tc qdisc add dev vcan0 root handle 1: htb
      	$ tc filter add dev vcan0 parent 1: protocol ip prio 10 \
      		matchall action skbmod swap mac
      
      Doing the above silently corrupts all the packets.  Do not perform skbmod
      actions for non-Ethernet packets.
      
      Fixes: 86da71b5 ("net_sched: Introduce skbmod action")
      Reviewed-by: NCong Wang <cong.wang@bytedance.com>
      Signed-off-by: NPeilin Ye <peilin.ye@bytedance.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      727d6a8b
    • V
      net: bridge: do not replay fdb entries pointing towards the bridge twice · cbb56b03
      Vladimir Oltean 提交于
      This simple script:
      
      ip link add br0 type bridge
      ip link set swp2 master br0
      ip link set br0 address 00:01:02:03:04:05
      ip link del br0
      
      produces this result on a DSA switch:
      
      [  421.306399] br0: port 1(swp2) entered blocking state
      [  421.311445] br0: port 1(swp2) entered disabled state
      [  421.472553] device swp2 entered promiscuous mode
      [  421.488986] device swp2 left promiscuous mode
      [  421.493508] br0: port 1(swp2) entered disabled state
      [  421.886107] sja1105 spi0.1: port 1 failed to delete 00:01:02:03:04:05 vid 1 from fdb: -ENOENT
      [  421.894374] sja1105 spi0.1: port 1 failed to delete 00:01:02:03:04:05 vid 0 from fdb: -ENOENT
      [  421.943982] br0: port 1(swp2) entered blocking state
      [  421.949030] br0: port 1(swp2) entered disabled state
      [  422.112504] device swp2 entered promiscuous mode
      
      A very simplified view of what happens is:
      
      (1) the bridge port is created, and the bridge device inherits its MAC
          address
      
      (2) when joining, the bridge port (DSA) requests a replay of the
          addition of all FDB entries towards this bridge port and towards the
          bridge device itself. In fact, DSA calls br_fdb_replay() twice:
      
      	br_fdb_replay(br, brport_dev);
      	br_fdb_replay(br, br);
      
          DSA uses reference counting for the FDB entries. So the MAC address
          of the bridge is simply kept with refcount 2. When the bridge port
          leaves under normal circumstances, everything cancels out since the
          replay of the FDB entry deletion is also done twice per VLAN.
      
      (3) when the bridge MAC address changes, switchdev is notified of the
          deletion of the old address and of the insertion of the new one.
          But the old address does not really go away, since it had refcount
          2, and the new address is added "only" with refcount 1.
      
      (4) when the bridge port leaves now, it will replay a deletion of the
          FDB entries pointing towards the bridge twice. Then DSA will
          complain that it can't delete something that no longer exists.
      
      It is clear that the problem is that the FDB entries towards the bridge
      are replayed too many times, so let's fix that problem.
      
      Fixes: 63c51453 ("net: dsa: replay the local bridge FDB entries pointing to the bridge dev too")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20210719093916.4099032-1-vladimir.oltean@nxp.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      cbb56b03
    • V
      ipv6: ip6_finish_output2: set sk into newly allocated nskb · 2d85a1b3
      Vasily Averin 提交于
      skb_set_owner_w() should set sk not to old skb but to new nskb.
      
      Fixes: 5796015f ("ipv6: allocate enough headroom in ip6_finish_output2()")
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Link: https://lore.kernel.org/r/70c0744f-89ae-1869-7e3e-4fa292158f4b@virtuozzo.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      2d85a1b3
    • E
      net/tcp_fastopen: fix data races around tfo_active_disable_stamp · 6f20c8ad
      Eric Dumazet 提交于
      tfo_active_disable_stamp is read and written locklessly.
      We need to annotate these accesses appropriately.
      
      Then, we need to perform the atomic_inc(tfo_active_disable_times)
      after the timestamp has been updated, and thus add barriers
      to make sure tcp_fastopen_active_should_disable() wont read
      a stale timestamp.
      
      Fixes: cf1ef3f0 ("net/tcp_fastopen: Disable active side TFO in certain scenarios")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Wei Wang <weiwan@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Acked-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6f20c8ad
  5. 19 7月, 2021 4 次提交
    • N
      netrom: Decrease sock refcount when sock timers expire · 517a16b1
      Nguyen Dinh Phi 提交于
      Commit 63346650 ("netrom: switch to sock timer API") switched to use
      sock timer API. It replaces mod_timer() by sk_reset_timer(), and
      del_timer() by sk_stop_timer().
      
      Function sk_reset_timer() will increase the refcount of sock if it is
      called on an inactive timer, hence, in case the timer expires, we need to
      decrease the refcount ourselves in the handler, otherwise, the sock
      refcount will be unbalanced and the sock will never be freed.
      Signed-off-by: NNguyen Dinh Phi <phind.uet@gmail.com>
      Reported-by: syzbot+10f1194569953b72f1ae@syzkaller.appspotmail.com
      Fixes: 63346650 ("netrom: switch to sock timer API")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      517a16b1
    • X
      sctp: trim optlen when it's a huge value in sctp_setsockopt · 2f3fdd8d
      Xin Long 提交于
      After commit ca84bd05 ("sctp: copy the optval from user space in
      sctp_setsockopt"), it does memory allocation in sctp_setsockopt with
      the optlen, and it would fail the allocation and return error if the
      optlen from user space is a huge value.
      
      This breaks some sockopts, like SCTP_HMAC_IDENT, SCTP_RESET_STREAMS and
      SCTP_AUTH_KEY, as when processing these sockopts before, optlen would
      be trimmed to a biggest value it needs when optlen is a huge value,
      instead of failing the allocation and returning error.
      
      This patch is to fix the allocation failure when it's a huge optlen from
      user space by trimming it to the biggest size sctp sockopt may need when
      necessary, and this biggest size is from sctp_setsockopt_reset_streams()
      for SCTP_RESET_STREAMS, which is bigger than those for SCTP_HMAC_IDENT
      and SCTP_AUTH_KEY.
      
      Fixes: ca84bd05 ("sctp: copy the optval from user space in sctp_setsockopt")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2f3fdd8d
    • P
      net: sched: fix memory leak in tcindex_partial_destroy_work · f5051bce
      Pavel Skripkin 提交于
      Syzbot reported memory leak in tcindex_set_parms(). The problem was in
      non-freed perfect hash in tcindex_partial_destroy_work().
      
      In tcindex_set_parms() new tcindex_data is allocated and some fields from
      old one are copied to new one, but not the perfect hash. Since
      tcindex_partial_destroy_work() is the destroy function for old
      tcindex_data, we need to free perfect hash to avoid memory leak.
      
      Reported-and-tested-by: syzbot+f0bbb2287b8993d4fa74@syzkaller.appspotmail.com
      Fixes: 331b7292 ("net: sched: RCU cls_tcindex")
      Signed-off-by: NPavel Skripkin <paskripkin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f5051bce
    • P
      net: Fix zero-copy head len calculation. · a17ad096
      Pravin B Shelar 提交于
      In some cases skb head could be locked and entire header
      data is pulled from skb. When skb_zerocopy() called in such cases,
      following BUG is triggered. This patch fixes it by copying entire
      skb in such cases.
      This could be optimized incase this is performance bottleneck.
      
      ---8<---
      kernel BUG at net/core/skbuff.c:2961!
      invalid opcode: 0000 [#1] SMP PTI
      CPU: 2 PID: 0 Comm: swapper/2 Tainted: G           OE     5.4.0-77-generic #86-Ubuntu
      Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.13.0-1ubuntu1.1 04/01/2014
      RIP: 0010:skb_zerocopy+0x37a/0x3a0
      RSP: 0018:ffffbcc70013ca38 EFLAGS: 00010246
      Call Trace:
       <IRQ>
       queue_userspace_packet+0x2af/0x5e0 [openvswitch]
       ovs_dp_upcall+0x3d/0x60 [openvswitch]
       ovs_dp_process_packet+0x125/0x150 [openvswitch]
       ovs_vport_receive+0x77/0xd0 [openvswitch]
       netdev_port_receive+0x87/0x130 [openvswitch]
       netdev_frame_hook+0x4b/0x60 [openvswitch]
       __netif_receive_skb_core+0x2b4/0xc90
       __netif_receive_skb_one_core+0x3f/0xa0
       __netif_receive_skb+0x18/0x60
       process_backlog+0xa9/0x160
       net_rx_action+0x142/0x390
       __do_softirq+0xe1/0x2d6
       irq_exit+0xae/0xb0
       do_IRQ+0x5a/0xf0
       common_interrupt+0xf/0xf
      
      Code that triggered BUG:
      int
      skb_zerocopy(struct sk_buff *to, struct sk_buff *from, int len, int hlen)
      {
              int i, j = 0;
              int plen = 0; /* length of skb->head fragment */
              int ret;
              struct page *page;
              unsigned int offset;
      
              BUG_ON(!from->head_frag && !hlen);
      Signed-off-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a17ad096
  6. 17 7月, 2021 2 次提交
  7. 16 7月, 2021 5 次提交
  8. 14 7月, 2021 2 次提交
  9. 13 7月, 2021 2 次提交
    • X
      xdp, net: Fix use-after-free in bpf_xdp_link_release · 5acc7d3e
      Xuan Zhuo 提交于
      The problem occurs between dev_get_by_index() and dev_xdp_attach_link().
      At this point, dev_xdp_uninstall() is called. Then xdp link will not be
      detached automatically when dev is released. But link->dev already
      points to dev, when xdp link is released, dev will still be accessed,
      but dev has been released.
      
      dev_get_by_index()        |
      link->dev = dev           |
                                |      rtnl_lock()
                                |      unregister_netdevice_many()
                                |          dev_xdp_uninstall()
                                |      rtnl_unlock()
      rtnl_lock();              |
      dev_xdp_attach_link()     |
      rtnl_unlock();            |
                                |      netdev_run_todo() // dev released
      bpf_xdp_link_release()    |
          /* access dev.        |
             use-after-free */  |
      
      [   45.966867] BUG: KASAN: use-after-free in bpf_xdp_link_release+0x3b8/0x3d0
      [   45.967619] Read of size 8 at addr ffff00000f9980c8 by task a.out/732
      [   45.968297]
      [   45.968502] CPU: 1 PID: 732 Comm: a.out Not tainted 5.13.0+ #22
      [   45.969222] Hardware name: linux,dummy-virt (DT)
      [   45.969795] Call trace:
      [   45.970106]  dump_backtrace+0x0/0x4c8
      [   45.970564]  show_stack+0x30/0x40
      [   45.970981]  dump_stack_lvl+0x120/0x18c
      [   45.971470]  print_address_description.constprop.0+0x74/0x30c
      [   45.972182]  kasan_report+0x1e8/0x200
      [   45.972659]  __asan_report_load8_noabort+0x2c/0x50
      [   45.973273]  bpf_xdp_link_release+0x3b8/0x3d0
      [   45.973834]  bpf_link_free+0xd0/0x188
      [   45.974315]  bpf_link_put+0x1d0/0x218
      [   45.974790]  bpf_link_release+0x3c/0x58
      [   45.975291]  __fput+0x20c/0x7e8
      [   45.975706]  ____fput+0x24/0x30
      [   45.976117]  task_work_run+0x104/0x258
      [   45.976609]  do_notify_resume+0x894/0xaf8
      [   45.977121]  work_pending+0xc/0x328
      [   45.977575]
      [   45.977775] The buggy address belongs to the page:
      [   45.978369] page:fffffc00003e6600 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4f998
      [   45.979522] flags: 0x7fffe0000000000(node=0|zone=0|lastcpupid=0x3ffff)
      [   45.980349] raw: 07fffe0000000000 fffffc00003e6708 ffff0000dac3c010 0000000000000000
      [   45.981309] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
      [   45.982259] page dumped because: kasan: bad access detected
      [   45.982948]
      [   45.983153] Memory state around the buggy address:
      [   45.983753]  ffff00000f997f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [   45.984645]  ffff00000f998000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      [   45.985533] >ffff00000f998080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      [   45.986419]                                               ^
      [   45.987112]  ffff00000f998100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      [   45.988006]  ffff00000f998180: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      [   45.988895] ==================================================================
      [   45.989773] Disabling lock debugging due to kernel taint
      [   45.990552] Kernel panic - not syncing: panic_on_warn set ...
      [   45.991166] CPU: 1 PID: 732 Comm: a.out Tainted: G    B             5.13.0+ #22
      [   45.991929] Hardware name: linux,dummy-virt (DT)
      [   45.992448] Call trace:
      [   45.992753]  dump_backtrace+0x0/0x4c8
      [   45.993208]  show_stack+0x30/0x40
      [   45.993627]  dump_stack_lvl+0x120/0x18c
      [   45.994113]  dump_stack+0x1c/0x34
      [   45.994530]  panic+0x3a4/0x7d8
      [   45.994930]  end_report+0x194/0x198
      [   45.995380]  kasan_report+0x134/0x200
      [   45.995850]  __asan_report_load8_noabort+0x2c/0x50
      [   45.996453]  bpf_xdp_link_release+0x3b8/0x3d0
      [   45.997007]  bpf_link_free+0xd0/0x188
      [   45.997474]  bpf_link_put+0x1d0/0x218
      [   45.997942]  bpf_link_release+0x3c/0x58
      [   45.998429]  __fput+0x20c/0x7e8
      [   45.998833]  ____fput+0x24/0x30
      [   45.999247]  task_work_run+0x104/0x258
      [   45.999731]  do_notify_resume+0x894/0xaf8
      [   46.000236]  work_pending+0xc/0x328
      [   46.000697] SMP: stopping secondary CPUs
      [   46.001226] Dumping ftrace buffer:
      [   46.001663]    (ftrace buffer empty)
      [   46.002110] Kernel Offset: disabled
      [   46.002545] CPU features: 0x00000001,23202c00
      [   46.003080] Memory Limit: none
      
      Fixes: aa8d3a71 ("bpf, xdp: Add bpf_link-based XDP attachment API")
      Reported-by: NAbaci <abaci@linux.alibaba.com>
      Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: NDust Li <dust.li@linux.alibaba.com>
      Acked-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210710031635.41649-1-xuanzhuo@linux.alibaba.com
      5acc7d3e
    • V
      ipv6: allocate enough headroom in ip6_finish_output2() · 5796015f
      Vasily Averin 提交于
      When TEE target mirrors traffic to another interface, sk_buff may
      not have enough headroom to be processed correctly.
      ip_finish_output2() detect this situation for ipv4 and allocates
      new skb with enogh headroom. However ipv6 lacks this logic in
      ip_finish_output2 and it leads to skb_under_panic:
      
       skbuff: skb_under_panic: text:ffffffffc0866ad4 len:96 put:24
       head:ffff97be85e31800 data:ffff97be85e317f8 tail:0x58 end:0xc0 dev:gre0
       ------------[ cut here ]------------
       kernel BUG at net/core/skbuff.c:110!
       invalid opcode: 0000 [#1] SMP PTI
       CPU: 2 PID: 393 Comm: kworker/2:2 Tainted: G           OE     5.13.0 #13
       Hardware name: Virtuozzo KVM, BIOS 1.11.0-2.vz7.4 04/01/2014
       Workqueue: ipv6_addrconf addrconf_dad_work
       RIP: 0010:skb_panic+0x48/0x4a
       Call Trace:
        skb_push.cold.111+0x10/0x10
        ipgre_header+0x24/0xf0 [ip_gre]
        neigh_connected_output+0xae/0xf0
        ip6_finish_output2+0x1a8/0x5a0
        ip6_output+0x5c/0x110
        nf_dup_ipv6+0x158/0x1000 [nf_dup_ipv6]
        tee_tg6+0x2e/0x40 [xt_TEE]
        ip6t_do_table+0x294/0x470 [ip6_tables]
        nf_hook_slow+0x44/0xc0
        nf_hook.constprop.34+0x72/0xe0
        ndisc_send_skb+0x20d/0x2e0
        ndisc_send_ns+0xd1/0x210
        addrconf_dad_work+0x3c8/0x540
        process_one_work+0x1d1/0x370
        worker_thread+0x30/0x390
        kthread+0x116/0x130
        ret_from_fork+0x22/0x30
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5796015f
  10. 12 7月, 2021 3 次提交
    • X
      bpf, test: fix NULL pointer dereference on invalid expected_attach_type · 5e21bb4e
      Xuan Zhuo 提交于
      These two types of XDP progs (BPF_XDP_DEVMAP, BPF_XDP_CPUMAP) will not be
      executed directly in the driver, therefore we should also not directly
      run them from here. To run in these two situations, there must be further
      preparations done, otherwise these may cause a kernel panic.
      
      For more details, see also dev_xdp_attach().
      
        [   46.982479] BUG: kernel NULL pointer dereference, address: 0000000000000000
        [   46.984295] #PF: supervisor read access in kernel mode
        [   46.985777] #PF: error_code(0x0000) - not-present page
        [   46.987227] PGD 800000010dca4067 P4D 800000010dca4067 PUD 10dca6067 PMD 0
        [   46.989201] Oops: 0000 [#1] SMP PTI
        [   46.990304] CPU: 7 PID: 562 Comm: a.out Not tainted 5.13.0+ #44
        [   46.992001] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/24
        [   46.995113] RIP: 0010:___bpf_prog_run+0x17b/0x1710
        [   46.996586] Code: 49 03 14 cc e8 76 f6 fe ff e9 ad fe ff ff 0f b6 43 01 48 0f bf 4b 02 48 83 c3 08 89 c2 83 e0 0f c0 ea 04 02
        [   47.001562] RSP: 0018:ffffc900005afc58 EFLAGS: 00010246
        [   47.003115] RAX: 0000000000000000 RBX: ffffc9000023f068 RCX: 0000000000000000
        [   47.005163] RDX: 0000000000000000 RSI: 0000000000000079 RDI: ffffc900005afc98
        [   47.007135] RBP: 0000000000000000 R08: ffffc9000023f048 R09: c0000000ffffdfff
        [   47.009171] R10: 0000000000000001 R11: ffffc900005afb40 R12: ffffc900005afc98
        [   47.011172] R13: 0000000000000001 R14: 0000000000000001 R15: ffffffff825258a8
        [   47.013244] FS:  00007f04a5207580(0000) GS:ffff88842fdc0000(0000) knlGS:0000000000000000
        [   47.015705] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [   47.017475] CR2: 0000000000000000 CR3: 0000000100182005 CR4: 0000000000770ee0
        [   47.019558] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [   47.021595] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [   47.023574] PKRU: 55555554
        [   47.024571] Call Trace:
        [   47.025424]  __bpf_prog_run32+0x32/0x50
        [   47.026296]  ? printk+0x53/0x6a
        [   47.027066]  ? ktime_get+0x39/0x90
        [   47.027895]  bpf_test_run.cold.28+0x23/0x123
        [   47.028866]  ? printk+0x53/0x6a
        [   47.029630]  bpf_prog_test_run_xdp+0x149/0x1d0
        [   47.030649]  __sys_bpf+0x1305/0x23d0
        [   47.031482]  __x64_sys_bpf+0x17/0x20
        [   47.032316]  do_syscall_64+0x3a/0x80
        [   47.033165]  entry_SYSCALL_64_after_hwframe+0x44/0xae
        [   47.034254] RIP: 0033:0x7f04a51364dd
        [   47.035133] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 48
        [   47.038768] RSP: 002b:00007fff8f9fc518 EFLAGS: 00000213 ORIG_RAX: 0000000000000141
        [   47.040344] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f04a51364dd
        [   47.041749] RDX: 0000000000000048 RSI: 0000000020002a80 RDI: 000000000000000a
        [   47.043171] RBP: 00007fff8f9fc530 R08: 0000000002049300 R09: 0000000020000100
        [   47.044626] R10: 0000000000000004 R11: 0000000000000213 R12: 0000000000401070
        [   47.046088] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
        [   47.047579] Modules linked in:
        [   47.048318] CR2: 0000000000000000
        [   47.049120] ---[ end trace 7ad34443d5be719a ]---
        [   47.050273] RIP: 0010:___bpf_prog_run+0x17b/0x1710
        [   47.051343] Code: 49 03 14 cc e8 76 f6 fe ff e9 ad fe ff ff 0f b6 43 01 48 0f bf 4b 02 48 83 c3 08 89 c2 83 e0 0f c0 ea 04 02
        [   47.054943] RSP: 0018:ffffc900005afc58 EFLAGS: 00010246
        [   47.056068] RAX: 0000000000000000 RBX: ffffc9000023f068 RCX: 0000000000000000
        [   47.057522] RDX: 0000000000000000 RSI: 0000000000000079 RDI: ffffc900005afc98
        [   47.058961] RBP: 0000000000000000 R08: ffffc9000023f048 R09: c0000000ffffdfff
        [   47.060390] R10: 0000000000000001 R11: ffffc900005afb40 R12: ffffc900005afc98
        [   47.061803] R13: 0000000000000001 R14: 0000000000000001 R15: ffffffff825258a8
        [   47.063249] FS:  00007f04a5207580(0000) GS:ffff88842fdc0000(0000) knlGS:0000000000000000
        [   47.065070] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [   47.066307] CR2: 0000000000000000 CR3: 0000000100182005 CR4: 0000000000770ee0
        [   47.067747] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [   47.069217] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [   47.070652] PKRU: 55555554
        [   47.071318] Kernel panic - not syncing: Fatal exception
        [   47.072854] Kernel Offset: disabled
        [   47.073683] ---[ end Kernel panic - not syncing: Fatal exception ]---
      
      Fixes: 92164774 ("bpf: cpumap: Add the possibility to attach an eBPF program to cpumap")
      Fixes: fbee97fe ("bpf: Add support to attach bpf program to a devmap entry")
      Reported-by: NAbaci <abaci@linux.alibaba.com>
      Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: NDust Li <dust.li@linux.alibaba.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NDavid Ahern <dsahern@kernel.org>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20210708080409.73525-1-xuanzhuo@linux.alibaba.com
      5e21bb4e
    • N
      net: bridge: multicast: fix MRD advertisement router port marking race · 000b7287
      Nikolay Aleksandrov 提交于
      When an MRD advertisement is received on a bridge port with multicast
      snooping enabled, we mark it as a router port automatically, that
      includes adding that port to the router port list. The multicast lock
      protects that list, but it is not acquired in the MRD advertisement case
      leading to a race condition, we need to take it to fix the race.
      
      Cc: stable@vger.kernel.org
      Cc: linus.luessing@c0d3.blue
      Fixes: 4b3087c7 ("bridge: Snoop Multicast Router Advertisements")
      Signed-off-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      000b7287
    • N
      net: bridge: multicast: fix PIM hello router port marking race · 04bef83a
      Nikolay Aleksandrov 提交于
      When a PIM hello packet is received on a bridge port with multicast
      snooping enabled, we mark it as a router port automatically, that
      includes adding that port the router port list. The multicast lock
      protects that list, but it is not acquired in the PIM message case
      leading to a race condition, we need to take it to fix the race.
      
      Cc: stable@vger.kernel.org
      Fixes: 91b02d3d ("bridge: mcast: add router port on PIM hello message")
      Signed-off-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      04bef83a
  11. 10 7月, 2021 7 次提交
    • P
      mptcp: properly account bulk freed memory · ce599c51
      Paolo Abeni 提交于
      After commit 87952603 ("mptcp: protect the rx path with
      the msk socket spinlock") the rmem currently used by a given
      msk is really sk_rmem_alloc - rmem_released.
      
      The safety check in mptcp_data_ready() does not take the above
      in due account, as a result legit incoming data is kept in
      subflow receive queue with no reason, delaying or blocking
      MPTCP-level ack generation.
      
      This change addresses the issue introducing a new helper to fetch
      the rmem memory and using it as needed. Additionally add a MIB
      counter for the exceptional event described above - the peer is
      misbehaving.
      
      Finally, introduce the required annotation when rmem_released is
      updated.
      
      Fixes: 87952603 ("mptcp: protect the rx path with the msk socket spinlock")
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/211Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce599c51
    • J
      mptcp: avoid processing packet if a subflow reset · 6787b7e3
      Jianguo Wu 提交于
      If check_fully_established() causes a subflow reset, it should not
      continue to process the packet in tcp_data_queue().
      Add a return value to mptcp_incoming_options(), and return false if a
      subflow has been reset, else return true. Then drop the packet in
      tcp_data_queue()/tcp_rcv_state_process() if mptcp_incoming_options()
      return false.
      
      Fixes: d5824847 ("mptcp: fix fallback for MP_JOIN subflows")
      Signed-off-by: NJianguo Wu <wujianguo@chinatelecom.cn>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6787b7e3
    • J
      mptcp: fix syncookie process if mptcp can not_accept new subflow · 8547ea5f
      Jianguo Wu 提交于
      Lots of "TCP: tcp_fin: Impossible, sk->sk_state=7" in client side
      when doing stress testing using wrk and webfsd.
      
      There are at least two cases may trigger this warning:
      1.mptcp is in syncookie, and server recv MP_JOIN SYN request,
        in subflow_check_req(), the mptcp_can_accept_new_subflow()
        return false, so subflow_init_req_cookie_join_save() isn't
        called, i.e. not store the data present in the MP_JOIN syn
        request and the random nonce in hash table - join_entries[],
        but still send synack. When recv 3rd-ack,
        mptcp_token_join_cookie_init_state() will return false, and
        3rd-ack is dropped, then if mptcp conn is closed by client,
        client will send a DATA_FIN and a MPTCP FIN, the DATA_FIN
        doesn't have MP_CAPABLE or MP_JOIN,
        so mptcp_subflow_init_cookie_req() will return 0, and pass
        the cookie check, MP_JOIN request is fallback to normal TCP.
        Server will send a TCP FIN if closed, in client side,
        when process TCP FIN, it will do reset, the code path is:
          tcp_data_queue()->mptcp_incoming_options()
            ->check_fully_established()->mptcp_subflow_reset().
        mptcp_subflow_reset() will set sock state to TCP_CLOSE,
        so tcp_fin will hit TCP_CLOSE, and print the warning.
      
      2.mptcp is in syncookie, and server recv 3rd-ack, in
        mptcp_subflow_init_cookie_req(), mptcp_can_accept_new_subflow()
        return false, and subflow_req->mp_join is not set to 1,
        so in subflow_syn_recv_sock() will not reset the MP_JOIN
        subflow, but fallback to normal TCP, and then the same thing
        happens when server will send a TCP FIN if closed.
      
      For case1, subflow_check_req() return -EPERM,
      then tcp_conn_request() will drop MP_JOIN SYN.
      
      For case2, let subflow_syn_recv_sock() call
      mptcp_can_accept_new_subflow(), and do fatal fallback, send reset.
      
      Fixes: 9466a1cc ("mptcp: enable JOIN requests even if cookies are in use")
      Signed-off-by: NJianguo Wu <wujianguo@chinatelecom.cn>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8547ea5f
    • J
      mptcp: remove redundant req destruct in subflow_check_req() · 030d37bd
      Jianguo Wu 提交于
      In subflow_check_req(), if subflow sport is mismatch, will put msk,
      destroy token, and destruct req, then return -EPERM, which can be
      done by subflow_req_destructor() via:
      
        tcp_conn_request()
          |--__reqsk_free()
            |--subflow_req_destructor()
      
      So we should remove these redundant code, otherwise will call
      tcp_v4_reqsk_destructor() twice, and may double free
      inet_rsk(req)->ireq_opt.
      
      Fixes: 5bc56388 ("mptcp: add port number check for MP_JOIN")
      Signed-off-by: NJianguo Wu <wujianguo@chinatelecom.cn>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      030d37bd
    • J
      mptcp: fix warning in __skb_flow_dissect() when do syn cookie for subflow join · 0c71929b
      Jianguo Wu 提交于
      I did stress test with wrk[1] and webfsd[2] with the assistance of
      mptcp-tools[3]:
      
        Server side:
            ./use_mptcp.sh webfsd -4 -R /tmp/ -p 8099
        Client side:
            ./use_mptcp.sh wrk -c 200 -d 30 -t 4 http://192.168.174.129:8099/
      
      and got the following warning message:
      
      [   55.552626] TCP: request_sock_subflow: Possible SYN flooding on port 8099. Sending cookies.  Check SNMP counters.
      [   55.553024] ------------[ cut here ]------------
      [   55.553027] WARNING: CPU: 0 PID: 10 at net/core/flow_dissector.c:984 __skb_flow_dissect+0x280/0x1650
      ...
      [   55.553117] CPU: 0 PID: 10 Comm: ksoftirqd/0 Not tainted 5.12.0+ #18
      [   55.553121] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020
      [   55.553124] RIP: 0010:__skb_flow_dissect+0x280/0x1650
      ...
      [   55.553133] RSP: 0018:ffffb79580087770 EFLAGS: 00010246
      [   55.553137] RAX: 0000000000000000 RBX: ffffffff8ddb58e0 RCX: ffffb79580087888
      [   55.553139] RDX: ffffffff8ddb58e0 RSI: ffff8f7e4652b600 RDI: 0000000000000000
      [   55.553141] RBP: ffffb79580087858 R08: 0000000000000000 R09: 0000000000000008
      [   55.553143] R10: 000000008c622965 R11: 00000000d3313a5b R12: ffff8f7e4652b600
      [   55.553146] R13: ffff8f7e465c9062 R14: 0000000000000000 R15: ffffb79580087888
      [   55.553149] FS:  0000000000000000(0000) GS:ffff8f7f75e00000(0000) knlGS:0000000000000000
      [   55.553152] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   55.553154] CR2: 00007f73d1d19000 CR3: 0000000135e10004 CR4: 00000000003706f0
      [   55.553160] Call Trace:
      [   55.553166]  ? __sha256_final+0x67/0xd0
      [   55.553173]  ? sha256+0x7e/0xa0
      [   55.553177]  __skb_get_hash+0x57/0x210
      [   55.553182]  subflow_init_req_cookie_join_save+0xac/0xc0
      [   55.553189]  subflow_check_req+0x474/0x550
      [   55.553195]  ? ip_route_output_key_hash+0x67/0x90
      [   55.553200]  ? xfrm_lookup_route+0x1d/0xa0
      [   55.553207]  subflow_v4_route_req+0x8e/0xd0
      [   55.553212]  tcp_conn_request+0x31e/0xab0
      [   55.553218]  ? selinux_socket_sock_rcv_skb+0x116/0x210
      [   55.553224]  ? tcp_rcv_state_process+0x179/0x6d0
      [   55.553229]  tcp_rcv_state_process+0x179/0x6d0
      [   55.553235]  tcp_v4_do_rcv+0xaf/0x220
      [   55.553239]  tcp_v4_rcv+0xce4/0xd80
      [   55.553243]  ? ip_route_input_rcu+0x246/0x260
      [   55.553248]  ip_protocol_deliver_rcu+0x35/0x1b0
      [   55.553253]  ip_local_deliver_finish+0x44/0x50
      [   55.553258]  ip_local_deliver+0x6c/0x110
      [   55.553262]  ? ip_rcv_finish_core.isra.19+0x5a/0x400
      [   55.553267]  ip_rcv+0xd1/0xe0
      ...
      
      After debugging, I found in __skb_flow_dissect(), skb->dev and skb->sk
      are both NULL, then net is NULL, and trigger WARN_ON_ONCE(!net),
      actually net is always NULL in this code path, as skb->dev is set to
      NULL in tcp_v4_rcv(), and skb->sk is never set.
      
      Code snippet in __skb_flow_dissect() that trigger warning:
        975         if (skb) {
        976                 if (!net) {
        977                         if (skb->dev)
        978                                 net = dev_net(skb->dev);
        979                         else if (skb->sk)
        980                                 net = sock_net(skb->sk);
        981                 }
        982         }
        983
        984         WARN_ON_ONCE(!net);
      
      So, using seq and transport header derived hash.
      
      [1] https://github.com/wg/wrk
      [2] https://github.com/ourway/webfsd
      [3] https://github.com/pabeni/mptcp-tools
      
      Fixes: 9466a1cc ("mptcp: enable JOIN requests even if cookies are in use")
      Suggested-by: NPaolo Abeni <pabeni@redhat.com>
      Suggested-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NJianguo Wu <wujianguo@chinatelecom.cn>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c71929b
    • H
      net: ip_tunnel: fix mtu calculation for ETHER tunnel devices · 9992a078
      Hangbin Liu 提交于
      Commit 28e104d0 ("net: ip_tunnel: fix mtu calculation") removed
      dev->hard_header_len subtraction when calculate MTU for tunnel devices
      as there is an overhead for device that has header_ops.
      
      But there are ETHER tunnel devices, like gre_tap or erspan, which don't
      have header_ops but set dev->hard_header_len during setup. This makes
      pkts greater than (MTU - ETH_HLEN) could not be xmited. Fix it by
      subtracting the ETHER tunnel devices' dev->hard_header_len for MTU
      calculation.
      
      Fixes: 28e104d0 ("net: ip_tunnel: fix mtu calculation")
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9992a078
    • A
      net: do not reuse skbuff allocated from skbuff_fclone_cache in the skb cache · 28b34f01
      Antoine Tenart 提交于
      Some socket buffers allocated in the fclone cache (in __alloc_skb) can
      end-up in the following path[1]:
      
      napi_skb_finish
        __kfree_skb_defer
          napi_skb_cache_put
      
      The issue is napi_skb_cache_put is not fclone friendly and will put
      those skbuff in the skb cache to be reused later, although this cache
      only expects skbuff allocated from skbuff_head_cache. When this happens
      the skbuff is eventually freed using the wrong origin cache, and we can
      see traces similar to:
      
      [ 1223.947534] cache_from_obj: Wrong slab cache. skbuff_head_cache but object is from skbuff_fclone_cache
      [ 1223.948895] WARNING: CPU: 3 PID: 0 at mm/slab.h:442 kmem_cache_free+0x251/0x3e0
      [ 1223.950211] Modules linked in:
      [ 1223.950680] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.13.0+ #474
      [ 1223.951587] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-3.fc34 04/01/2014
      [ 1223.953060] RIP: 0010:kmem_cache_free+0x251/0x3e0
      
      Leading sometimes to other memory related issues.
      
      Fix this by using __kfree_skb for fclone skbuff, similar to what is done
      the other place __kfree_skb_defer is called.
      
      [1] At least in setups using veth pairs and tunnels. Building a kernel
          with KASAN we can for example see packets allocated in
          sk_stream_alloc_skb hit the above path and later the issue arises
          when the skbuff is reused.
      
      Fixes: 9243adfc ("skbuff: queue NAPI_MERGED_FREE skbs into NAPI cache instead of freeing")
      Cc: Alexander Lobakin <alobakin@pm.me>
      Signed-off-by: NAntoine Tenart <atenart@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28b34f01