1. 02 4月, 2021 5 次提交
  2. 27 3月, 2021 1 次提交
  3. 25 3月, 2021 1 次提交
    • P
      net: resolve forwarding path from virtual netdevice and HW destination address · ddb94eaf
      Pablo Neira Ayuso 提交于
      This patch adds dev_fill_forward_path() which resolves the path to reach
      the real netdevice from the IP forwarding side. This function takes as
      input the netdevice and the destination hardware address and it walks
      down the devices calling .ndo_fill_forward_path() for each device until
      the real device is found.
      
      For instance, assuming the following topology:
      
                     IP forwarding
                    /             \
                 br0              eth0
                 / \
             eth1  eth2
              .
              .
              .
             ethX
       ab:cd:ef:ab:cd:ef
      
      where eth1 and eth2 are bridge ports and eth0 provides WAN connectivity.
      ethX is the interface in another box which is connected to the eth1
      bridge port.
      
      For packets going through IP forwarding to br0 whose destination MAC
      address is ab:cd:ef:ab:cd:ef, dev_fill_forward_path() provides the
      following path:
      
      	br0 -> eth1
      
      .ndo_fill_forward_path for br0 looks up at the FDB for the bridge port
      from the destination MAC address to get the bridge port eth1.
      
      This information allows to create a fast path that bypasses the classic
      bridge and IP forwarding paths, so packets go directly from the bridge
      port eth1 to eth0 (wan interface) and vice versa.
      
                   fast path
            .------------------------.
           /                          \
          |           IP forwarding   |
          |          /             \  \/
          |       br0               eth0
          .       / \
           -> eth1  eth2
              .
              .
              .
             ethX
       ab:cd:ef:ab:cd:ef
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ddb94eaf
  4. 24 3月, 2021 1 次提交
    • D
      net: make unregister netdev warning timeout configurable · 5aa3afe1
      Dmitry Vyukov 提交于
      netdev_wait_allrefs() issues a warning if refcount does not drop to 0
      after 10 seconds. While 10 second wait generally should not happen
      under normal workload in normal environment, it seems to fire falsely
      very often during fuzzing and/or in qemu emulation (~10x slower).
      At least it's not possible to understand if it's really a false
      positive or not. Automated testing generally bumps all timeouts
      to very high values to avoid flake failures.
      Add net.core.netdev_unregister_timeout_secs sysctl to make
      the timeout configurable for automated testing systems.
      Lowering the timeout may also be useful for e.g. manual bisection.
      The default value matches the current behavior.
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=211877
      Cc: netdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5aa3afe1
  5. 23 3月, 2021 4 次提交
  6. 20 3月, 2021 1 次提交
    • E
      net: add CONFIG_PCPU_DEV_REFCNT · 919067cc
      Eric Dumazet 提交于
      I was working on a syzbot issue, claiming one device could not be
      dismantled because its refcount was -1
      
      unregister_netdevice: waiting for sit0 to become free. Usage count = -1
      
      It would be nice if syzbot could trigger a warning at the time
      this reference count became negative.
      
      This patch adds CONFIG_PCPU_DEV_REFCNT options which defaults
      to per cpu variables (as before this patch) on SMP builds.
      
      v2: free_dev label in alloc_netdev_mqs() is moved to avoid
          a compiler warning (-Wunused-label), as reported
          by kernel test robot <lkp@intel.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      919067cc
  7. 19 3月, 2021 15 次提交
  8. 18 3月, 2021 1 次提交
    • W
      net: fix race between napi kthread mode and busy poll · cb038357
      Wei Wang 提交于
      Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
      determine if the kthread owns this napi and could call napi->poll() on
      it. However, if socket busy poll is enabled, it is possible that the
      busy poll thread grabs this SCHED bit (after the previous napi->poll()
      invokes napi_complete_done() and clears SCHED bit) and tries to poll
      on the same napi. napi_disable() could grab the SCHED bit as well.
      This patch tries to fix this race by adding a new bit
      NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
      ____napi_schedule() if the threaded mode is enabled, and gets cleared
      in napi_complete_done(), and we only poll the napi in kthread if this
      bit is set. This helps distinguish the ownership of the napi between
      kthread and other scenarios and fixes the race issue.
      
      Fixes: 29863d41 ("net: implement threaded-able napi poll loop support")
      Reported-by: NMartin Zaharinov <micron10@gmail.com>
      Suggested-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NWei Wang <weiwan@google.com>
      Cc: Alexander Duyck <alexanderduyck@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb038357
  9. 16 3月, 2021 3 次提交
    • M
      can: dev: Move device back to init netns on owning netns delete · 3a5ca857
      Martin Willi 提交于
      When a non-initial netns is destroyed, the usual policy is to delete
      all virtual network interfaces contained, but move physical interfaces
      back to the initial netns. This keeps the physical interface visible
      on the system.
      
      CAN devices are somewhat special, as they define rtnl_link_ops even
      if they are physical devices. If a CAN interface is moved into a
      non-initial netns, destroying that netns lets the interface vanish
      instead of moving it back to the initial netns. default_device_exit()
      skips CAN interfaces due to having rtnl_link_ops set. Reproducer:
      
        ip netns add foo
        ip link set can0 netns foo
        ip netns delete foo
      
      WARNING: CPU: 1 PID: 84 at net/core/dev.c:11030 ops_exit_list+0x38/0x60
      CPU: 1 PID: 84 Comm: kworker/u4:2 Not tainted 5.10.19 #1
      Workqueue: netns cleanup_net
      [<c010e700>] (unwind_backtrace) from [<c010a1d8>] (show_stack+0x10/0x14)
      [<c010a1d8>] (show_stack) from [<c086dc10>] (dump_stack+0x94/0xa8)
      [<c086dc10>] (dump_stack) from [<c086b938>] (__warn+0xb8/0x114)
      [<c086b938>] (__warn) from [<c086ba10>] (warn_slowpath_fmt+0x7c/0xac)
      [<c086ba10>] (warn_slowpath_fmt) from [<c0629f20>] (ops_exit_list+0x38/0x60)
      [<c0629f20>] (ops_exit_list) from [<c062a5c4>] (cleanup_net+0x230/0x380)
      [<c062a5c4>] (cleanup_net) from [<c0142c20>] (process_one_work+0x1d8/0x438)
      [<c0142c20>] (process_one_work) from [<c0142ee4>] (worker_thread+0x64/0x5a8)
      [<c0142ee4>] (worker_thread) from [<c0148a98>] (kthread+0x148/0x14c)
      [<c0148a98>] (kthread) from [<c0100148>] (ret_from_fork+0x14/0x2c)
      
      To properly restore physical CAN devices to the initial netns on owning
      netns exit, introduce a flag on rtnl_link_ops that can be set by drivers.
      For CAN devices setting this flag, default_device_exit() considers them
      non-virtual, applying the usual namespace move.
      
      The issue was introduced in the commit mentioned below, as at that time
      CAN devices did not have a dellink() operation.
      
      Fixes: e008b5fc ("net: Simplfy default_device_exit and improve batching.")
      Link: https://lore.kernel.org/r/20210302122423.872326-1-martin@strongswan.orgSigned-off-by: NMartin Willi <martin@strongswan.org>
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      3a5ca857
    • L
      net: export dev_set_threaded symbol · 8f64860f
      Lorenzo Bianconi 提交于
      For wireless devices (e.g. mt76 driver) multiple net_devices belongs to
      the same wireless phy and the napi object is registered in a dummy
      netdevice related to the wireless phy.
      Export dev_set_threaded in order to be reused in device drivers enabling
      threaded NAPI.
      Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f64860f
    • M
      bpf: Add getter and setter for SO_REUSEPORT through bpf_{g,s}etsockopt · 6503b9f2
      Manu Bretelle 提交于
      Augment the current set of options that are accessible via
      bpf_{g,s}etsockopt to also support SO_REUSEPORT.
      Signed-off-by: NManu Bretelle <chantra@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20210310182305.1910312-1-chantra@fb.com
      6503b9f2
  10. 15 3月, 2021 5 次提交
  11. 12 3月, 2021 1 次提交
  12. 11 3月, 2021 2 次提交
    • I
      drop_monitor: Perform cleanup upon probe registration failure · 9398e9c0
      Ido Schimmel 提交于
      In the rare case that drop_monitor fails to register its probe on the
      'napi_poll' tracepoint, it will not deactivate its hysteresis timer as
      part of the error path. If the hysteresis timer was armed by the shortly
      lived 'kfree_skb' probe and user space retries to initiate tracing, a
      warning will be emitted for trying to initialize an active object [1].
      
      Fix this by properly undoing all the operations that were done prior to
      probe registration, in both software and hardware code paths.
      
      Note that syzkaller managed to fail probe registration by injecting a
      slab allocation failure [2].
      
      [1]
      ODEBUG: init active (active state 0) object type: timer_list hint: sched_send_work+0x0/0x60 include/linux/list.h:135
      WARNING: CPU: 1 PID: 8649 at lib/debugobjects.c:505 debug_print_object+0x16e/0x250 lib/debugobjects.c:505
      Modules linked in:
      CPU: 1 PID: 8649 Comm: syz-executor.0 Not tainted 5.11.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:505
      [...]
      Call Trace:
       __debug_object_init+0x524/0xd10 lib/debugobjects.c:588
       debug_timer_init kernel/time/timer.c:722 [inline]
       debug_init kernel/time/timer.c:770 [inline]
       init_timer_key+0x2d/0x340 kernel/time/timer.c:814
       net_dm_trace_on_set net/core/drop_monitor.c:1111 [inline]
       set_all_monitor_traces net/core/drop_monitor.c:1188 [inline]
       net_dm_monitor_start net/core/drop_monitor.c:1295 [inline]
       net_dm_cmd_trace+0x720/0x1220 net/core/drop_monitor.c:1339
       genl_family_rcv_msg_doit+0x228/0x320 net/netlink/genetlink.c:739
       genl_family_rcv_msg net/netlink/genetlink.c:783 [inline]
       genl_rcv_msg+0x328/0x580 net/netlink/genetlink.c:800
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2502
       genl_rcv+0x24/0x40 net/netlink/genetlink.c:811
       netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline]
       netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338
       netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:672
       ____sys_sendmsg+0x6e8/0x810 net/socket.c:2348
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2402
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2435
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      [2]
       FAULT_INJECTION: forcing a failure.
       name failslab, interval 1, probability 0, space 0, times 1
       CPU: 1 PID: 8645 Comm: syz-executor.0 Not tainted 5.11.0-syzkaller #0
       Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
       Call Trace:
        dump_stack+0xfa/0x151
        should_fail.cold+0x5/0xa
        should_failslab+0x5/0x10
        __kmalloc+0x72/0x3f0
        tracepoint_add_func+0x378/0x990
        tracepoint_probe_register+0x9c/0xe0
        net_dm_cmd_trace+0x7fc/0x1220
        genl_family_rcv_msg_doit+0x228/0x320
        genl_rcv_msg+0x328/0x580
        netlink_rcv_skb+0x153/0x420
        genl_rcv+0x24/0x40
        netlink_unicast+0x533/0x7d0
        netlink_sendmsg+0x856/0xd90
        sock_sendmsg+0xcf/0x120
        ____sys_sendmsg+0x6e8/0x810
        ___sys_sendmsg+0xf3/0x170
        __sys_sendmsg+0xe5/0x1b0
        do_syscall_64+0x2d/0x70
        entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fixes: 70c69274 ("drop_monitor: Initialize timer and work item upon tracing enable")
      Fixes: 8ee2267a ("drop_monitor: Convert to using devlink tracepoint")
      Reported-by: syzbot+779559d6503f3a56213d@syzkaller.appspotmail.com
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9398e9c0
    • Y
      skbuff: remove some unnecessary operation in skb_segment_list() · 1ddc3229
      Yunsheng Lin 提交于
      gro list uses skb_shinfo(skb)->frag_list to link two skb together,
      and NAPI_GRO_CB(p)->last->next is used when there are more skb,
      see skb_gro_receive_list(). gso expects that each segmented skb is
      linked together using skb->next, so only the first skb->next need
      to set to skb_shinfo(skb)-> frag_list when doing gso list segment.
      
      It is the same reason that nskb->next does not need to be set to
      list_skb before goto the error handling, because nskb->next already
      pointers to list_skb.
      
      And nskb is also the last skb at the end of loop, so remove tail
      variable and use nskb instead.
      Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ddc3229