1. 07 11月, 2021 1 次提交
  2. 06 11月, 2021 2 次提交
    • N
      ipv6: remove useless assignment to newinet in tcp_v6_syn_recv_sock() · 70bf363d
      Nghia Le 提交于
      The newinet value is initialized with inet_sk() in a block code to
      handle sockets for the ETH_P_IP protocol. Along this code path,
      newinet is never read. Thus, assignment to newinet is needless and
      can be removed.
      Signed-off-by: NNghia Le <nghialm78@gmail.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20211104143740.32446-1-nghialm78@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      70bf363d
    • J
      Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 9bea6aa4
      Jakub Kicinski 提交于
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2021-11-05
      
      We've added 15 non-merge commits during the last 3 day(s) which contain
      a total of 14 files changed, 199 insertions(+), 90 deletions(-).
      
      The main changes are:
      
      1) Fix regression from stack spill/fill of <8 byte scalars, from Martin KaFai Lau.
      
      2) Fix perf's build of bpftool's bootstrap version due to missing libbpf
         headers, from Quentin Monnet.
      
      3) Fix riscv{32,64} BPF exception tables build errors and warnings, from Björn Töpel.
      
      4) Fix bpf fs to allow RENAME_EXCHANGE support for atomic upgrades on sk_lookup
         control planes, from Lorenz Bauer.
      
      5) Fix libbpf's error reporting in bpf_map_lookup_and_delete_elem_flags() due to
         missing libbpf_err_errno(), from Mehrdad Arshad Rad.
      
      6) Various fixes to make xdp_redirect_multi selftest more reliable, from Hangbin Liu.
      
      7) Fix netcnt selftest to make it run serial and thus avoid conflicts with other
         cgroup/skb selftests run in parallel that could cause flakes, from Andrii Nakryiko.
      
      8) Fix reuseport_bpf_numa networking selftest to skip unavailable NUMA nodes,
         from Kleber Sacilotto de Souza.
      
      * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        riscv, bpf: Fix RV32 broken build, and silence RV64 warning
        selftests/bpf/xdp_redirect_multi: Limit the tests in netns
        selftests/bpf/xdp_redirect_multi: Give tcpdump a chance to terminate cleanly
        selftests/bpf/xdp_redirect_multi: Use arping to accurate the arp number
        selftests/bpf/xdp_redirect_multi: Put the logs to tmp folder
        libbpf: Fix lookup_and_delete_elem_flags error reporting
        bpftool: Install libbpf headers for the bootstrap version, too
        selftests/net: Fix reuseport_bpf_numa by skipping unavailable nodes
        selftests/bpf: Verifier test on refill from a smaller spill
        bpf: Do not reject when the stack read size is different from the tracked scalar size
        selftests/bpf: Make netcnt selftests serial to avoid spurious failures
        selftests/bpf: Test RENAME_EXCHANGE and RENAME_NOREPLACE on bpffs
        selftests/bpf: Convert test_bpffs to ASSERT macros
        libfs: Support RENAME_EXCHANGE in simple_rename()
        libfs: Move shmem_exchange to simple_rename_exchange
      ====================
      
      Link: https://lore.kernel.org/r/20211105165803.29372-1-daniel@iogearbox.netSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      9bea6aa4
  3. 05 11月, 2021 27 次提交
  4. 04 11月, 2021 4 次提交
    • K
      selftests/net: Fix reuseport_bpf_numa by skipping unavailable nodes · a38bc45a
      Kleber Sacilotto de Souza 提交于
      In some platforms the numa node numbers are not necessarily consecutive,
      meaning that not all nodes from 0 to the value returned by numa_max_node()
      are available on the system. Using node numbers which are not available
      results on errors from libnuma such as:
      
        ---- IPv4 UDP ----
        send node 0, receive socket 0
        libnuma: Warning: Cannot read node cpumask from sysfs
        ./reuseport_bpf_numa: failed to pin to node: No such file or directory
      
      Fix it by checking if the node number bit is set on numa_nodes_ptr, which
      is defined on libnuma as "Set with all nodes the kernel has exposed to
      userspace".
      Signed-off-by: NKleber Sacilotto de Souza <kleber.souza@canonical.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20211101145317.286118-1-kleber.souza@canonical.com
      a38bc45a
    • E
      net: fix possible NULL deref in sock_reserve_memory · d00c8ee3
      Eric Dumazet 提交于
      Sanity check in sock_reserve_memory() was not enough to prevent malicious
      user to trigger a NULL deref.
      
      In this case, the isse is that sk_prot->memory_allocated is NULL.
      
      Use standard sk_has_account() helper to deal with this.
      
      BUG: KASAN: null-ptr-deref in instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
      BUG: KASAN: null-ptr-deref in atomic_long_add_return include/linux/atomic/atomic-instrumented.h:1218 [inline]
      BUG: KASAN: null-ptr-deref in sk_memory_allocated_add include/net/sock.h:1371 [inline]
      BUG: KASAN: null-ptr-deref in sock_reserve_memory net/core/sock.c:994 [inline]
      BUG: KASAN: null-ptr-deref in sock_setsockopt+0x22ab/0x2b30 net/core/sock.c:1443
      Write of size 8 at addr 0000000000000000 by task syz-executor.0/11270
      
      CPU: 1 PID: 11270 Comm: syz-executor.0 Not tainted 5.15.0-syzkaller #0
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       __kasan_report mm/kasan/report.c:446 [inline]
       kasan_report.cold+0x66/0xdf mm/kasan/report.c:459
       check_region_inline mm/kasan/generic.c:183 [inline]
       kasan_check_range+0x13d/0x180 mm/kasan/generic.c:189
       instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
       atomic_long_add_return include/linux/atomic/atomic-instrumented.h:1218 [inline]
       sk_memory_allocated_add include/net/sock.h:1371 [inline]
       sock_reserve_memory net/core/sock.c:994 [inline]
       sock_setsockopt+0x22ab/0x2b30 net/core/sock.c:1443
       __sys_setsockopt+0x4f8/0x610 net/socket.c:2172
       __do_sys_setsockopt net/socket.c:2187 [inline]
       __se_sys_setsockopt net/socket.c:2184 [inline]
       __x64_sys_setsockopt+0xba/0x150 net/socket.c:2184
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f56076d5ae9
      Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f5604c4b188 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
      RAX: ffffffffffffffda RBX: 00007f56077e8f60 RCX: 00007f56076d5ae9
      RDX: 0000000000000049 RSI: 0000000000000001 RDI: 0000000000000003
      RBP: 00007f560772ff25 R08: 000000000000fec7 R09: 0000000000000000
      R10: 0000000020000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 00007fffb61a100f R14: 00007f5604c4b300 R15: 0000000000022000
       </TASK>
      
      Fixes: 2bb2f5fb ("net: add new socket option SO_RESERVE_MEM")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Acked-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d00c8ee3
    • L
      tcp: Use BIT() for OPTION_* constants · 3b65abb8
      Leonard Crestez 提交于
      Extending these flags using the existing (1 << x) pattern triggers
      complaints from checkpatch. Instead of ignoring checkpatch modify the
      existing values to use BIT(x) style in a separate commit.
      Signed-off-by: NLeonard Crestez <cdleonard@gmail.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b65abb8
    • A
      selftests: net: properly support IPv6 in GSO GRE test · a985442f
      Andrea Righi 提交于
      Explicitly pass -6 to netcat when the test is using IPv6 to prevent
      failures.
      
      Also make sure to pass "-N" to netcat to close the socket after EOF on
      the client side, otherwise we would always hit the timeout and the test
      would fail.
      
      Without this fix applied:
      
       TEST: GREv6/v4 - copy file w/ TSO                                   [FAIL]
       TEST: GREv6/v4 - copy file w/ GSO                                   [FAIL]
       TEST: GREv6/v6 - copy file w/ TSO                                   [FAIL]
       TEST: GREv6/v6 - copy file w/ GSO                                   [FAIL]
      
      With this fix applied:
      
       TEST: GREv6/v4 - copy file w/ TSO                                   [ OK ]
       TEST: GREv6/v4 - copy file w/ GSO                                   [ OK ]
       TEST: GREv6/v6 - copy file w/ TSO                                   [ OK ]
       TEST: GREv6/v6 - copy file w/ GSO                                   [ OK ]
      
      Fixes: 025efa0a ("selftests: add simple GSO GRE test")
      Signed-off-by: NAndrea Righi <andrea.righi@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a985442f
  5. 03 11月, 2021 6 次提交
    • B
      ice: Fix race conditions between virtchnl handling and VF ndo ops · e6ba5273
      Brett Creeley 提交于
      The VF can be configured via the PF's ndo ops at the same time the PF is
      receiving/handling virtchnl messages. This has many issues, with
      one of them being the ndo op could be actively resetting a VF (i.e.
      resetting it to the default state and deleting/re-adding the VF's VSI)
      while a virtchnl message is being handled. The following error was seen
      because a VF ndo op was used to change a VF's trust setting while the
      VIRTCHNL_OP_CONFIG_VSI_QUEUES was ongoing:
      
      [35274.192484] ice 0000:88:00.0: Failed to set LAN Tx queue context, error: ICE_ERR_PARAM
      [35274.193074] ice 0000:88:00.0: VF 0 failed opcode 6, retval: -5
      [35274.193640] iavf 0000:88:01.0: PF returned error -5 (IAVF_ERR_PARAM) to our request 6
      
      Fix this by making sure the virtchnl handling and VF ndo ops that
      trigger VF resets cannot run concurrently. This is done by adding a
      struct mutex cfg_lock to each VF structure. For VF ndo ops, the mutex
      will be locked around the critical operations and VFR. Since the ndo ops
      will trigger a VFR, the virtchnl thread will use mutex_trylock(). This
      is done because if any other thread (i.e. VF ndo op) has the mutex, then
      that means the current VF message being handled is no longer valid, so
      just ignore it.
      
      This issue can be seen using the following commands:
      
      for i in {0..50}; do
              rmmod ice
              modprobe ice
      
              sleep 1
      
              echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
              echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs
      
              ip link set ens785f1 vf 0 trust on
              ip link set ens785f0 vf 0 trust on
      
              sleep 2
      
              echo 0 > /sys/class/net/ens785f0/device/sriov_numvfs
              echo 0 > /sys/class/net/ens785f1/device/sriov_numvfs
              sleep 1
              echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
              echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs
      
              ip link set ens785f1 vf 0 trust on
              ip link set ens785f0 vf 0 trust on
      done
      
      Fixes: 7c710869 ("ice: Add handlers for VF netdevice operations")
      Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      e6ba5273
    • B
      ice: Fix not stopping Tx queues for VFs · b385cca4
      Brett Creeley 提交于
      When a VF is removed and/or reset its Tx queues need to be
      stopped from the PF. This is done by calling the ice_dis_vf_qs()
      function, which calls ice_vsi_stop_lan_tx_rings(). Currently
      ice_dis_vf_qs() is protected by the VF state bit ICE_VF_STATE_QS_ENA.
      Unfortunately, this is causing the Tx queues to not be disabled in some
      cases and when the VF tries to re-enable/reconfigure its Tx queues over
      virtchnl the op is failing. This is because a VF can be reset and/or
      removed before the ICE_VF_STATE_QS_ENA bit is set, but the Tx queues
      were already configured via ice_vsi_cfg_single_txq() in the
      VIRTCHNL_OP_CONFIG_VSI_QUEUES op. However, the ICE_VF_STATE_QS_ENA bit
      is set on a successful VIRTCHNL_OP_ENABLE_QUEUES, which will always
      happen after the VIRTCHNL_OP_CONFIG_VSI_QUEUES op.
      
      This was causing the following error message when loading the ice
      driver, creating VFs, and modifying VF trust in an endless loop:
      
      [35274.192484] ice 0000:88:00.0: Failed to set LAN Tx queue context, error: ICE_ERR_PARAM
      [35274.193074] ice 0000:88:00.0: VF 0 failed opcode 6, retval: -5
      [35274.193640] iavf 0000:88:01.0: PF returned error -5 (IAVF_ERR_PARAM) to our request 6
      
      Fix this by always calling ice_dis_vf_qs() and silencing the error
      message in ice_vsi_stop_tx_ring() since the calling code ignores the
      return anyway. Also, all other places that call ice_vsi_stop_tx_ring()
      catch the error, so this doesn't affect those flows since there was no
      change to the values the function returns.
      
      Other solutions were considered (i.e. tracking which VF queues had been
      "started/configured" in VIRTCHNL_OP_CONFIG_VSI_QUEUES, but it seemed
      more complicated than it was worth. This solution also brings in the
      chance for other unexpected conditions due to invalid state bit checks.
      So, the proposed solution seemed like the best option since there is no
      harm in failing to stop Tx queues that were never started.
      
      This issue can be seen using the following commands:
      
      for i in {0..50}; do
              rmmod ice
              modprobe ice
      
              sleep 1
      
              echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
              echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs
      
              ip link set ens785f1 vf 0 trust on
              ip link set ens785f0 vf 0 trust on
      
              sleep 2
      
              echo 0 > /sys/class/net/ens785f0/device/sriov_numvfs
              echo 0 > /sys/class/net/ens785f1/device/sriov_numvfs
              sleep 1
              echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
              echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs
      
              ip link set ens785f1 vf 0 trust on
              ip link set ens785f0 vf 0 trust on
      done
      
      Fixes: 77ca27c4 ("ice: add support for virtchnl_queue_select.[tx|rx]_queues bitmap")
      Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      b385cca4
    • S
      ice: Fix replacing VF hardware MAC to existing MAC filter · ce572a5b
      Sylwester Dziedziuch 提交于
      VF was not able to change its hardware MAC address in case
      the new address was already present in the MAC filter list.
      Change the handling of VF add mac request to not return
      if requested MAC address is already present on the list
      and check if its hardware MAC needs to be updated in this case.
      
      Fixes: ed4c068d ("ice: Enable ip link show on the PF to display VF unicast MAC(s)")
      Signed-off-by: NSylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
      Tested-by: NTony Brelinski <tony.brelinski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      ce572a5b
    • B
      ice: Remove toggling of antispoof for VF trusted promiscuous mode · 0299faea
      Brett Creeley 提交于
      Currently when a trusted VF enables promiscuous mode spoofchk will be
      disabled. This is wrong and should only be modified from the
      ndo_set_vf_spoofchk callback. Fix this by removing the call to toggle
      spoofchk for trusted VFs.
      
      Fixes: 01b5e89a ("ice: Add VF promiscuous support")
      Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
      Tested-by: NTony Brelinski <tony.brelinski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      0299faea
    • B
      ice: Fix VF true promiscuous mode · 1a8c7778
      Brett Creeley 提交于
      When a VF requests promiscuous mode and it's trusted and true promiscuous
      mode is enabled the PF driver attempts to enable unicast and/or
      multicast promiscuous mode filters based on the request. This is fine,
      but there are a couple issues with the current code.
      
      [1] The define to configure the unicast promiscuous mode mask also
          includes bits to configure the multicast promiscuous mode mask, which
          causes multicast to be set/cleared unintentionally.
      [2] All 4 cases for enable/disable unicast/multicast mode are not
          handled in the promiscuous mode message handler, which causes
          unexpected results regarding the current promiscuous mode settings.
      
      To fix [1] make sure any promiscuous mask defines include the correct
      bits for each of the promiscuous modes.
      
      To fix [2] make sure that all 4 cases are handled since there are 2 bits
      (FLAG_VF_UNICAST_PROMISC and FLAG_VF_MULTICAST_PROMISC) that can be
      either set or cleared. Also, since either unicast and/or multicast
      promiscuous configuration can fail, introduce two separate error values
      to handle each of these cases.
      
      Fixes: 01b5e89a ("ice: Add VF promiscuous support")
      Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
      Tested-by: NTony Brelinski <tony.brelinski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      1a8c7778
    • M
      selftests/bpf: Verifier test on refill from a smaller spill · c08455de
      Martin KaFai Lau 提交于
      This patch adds a verifier test to ensure the verifier can read 8 bytes
      from the stack after two 32bit write at fp-4 and fp-8. The test is similar
      to the reported case from bcc [0].
      
        [0] https://github.com/iovisor/bcc/pull/3683Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20211102064541.316414-1-kafai@fb.com
      c08455de