1. 19 3月, 2021 4 次提交
  2. 18 3月, 2021 5 次提交
    • L
      bpf, devmap: Move drop error path to devmap for XDP_REDIRECT · fdc13979
      Lorenzo Bianconi 提交于
      We want to change the current ndo_xdp_xmit drop semantics because it will
      allow us to implement better queue overflow handling. This is working
      towards the larger goal of a XDP TX queue-hook. Move XDP_REDIRECT error
      path handling from each XDP ethernet driver to devmap code. According to
      the new APIs, the driver running the ndo_xdp_xmit pointer, will break tx
      loop whenever the hw reports a tx error and it will just return to devmap
      caller the number of successfully transmitted frames. It will be devmap
      responsibility to free dropped frames.
      
      Move each XDP ndo_xdp_xmit capable driver to the new APIs:
      
      - veth
      - virtio-net
      - mvneta
      - mvpp2
      - socionext
      - amazon ena
      - bnxt
      - freescale (dpaa2, dpaa)
      - xen-frontend
      - qede
      - ice
      - igb
      - ixgbe
      - i40e
      - mlx5
      - ti (cpsw, cpsw-new)
      - tun
      - sfc
      Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: NIoana Ciornei <ioana.ciornei@nxp.com>
      Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
      Reviewed-by: NCamelia Groza <camelia.groza@nxp.com>
      Acked-by: NEdward Cree <ecree.xilinx@gmail.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NShay Agroskin <shayagr@amazon.com>
      Link: https://lore.kernel.org/bpf/ed670de24f951cfd77590decf0229a0ad7fd12f6.1615201152.git.lorenzo@kernel.org
      fdc13979
    • A
      Merge branch 'Provide NULL and KERNEL_VERSION macros in bpf_helpers.h' · 6b282765
      Alexei Starovoitov 提交于
      Andrii Nakryiko says:
      
      ====================
      
      Provide NULL and KERNEL_VERSION macros in bpf_helpers.h. Patch #2 removes such
      custom NULL definition from one of the selftests.
      
      v2->v3:
        - instead of vmlinux.h, do this in bpf_helpers.h;
        - added KERNEL_VERSION, which comes up periodically as well;
        - I dropped strict compilation patches for now, because we run into new
          warnings (e.g., not checking read() result) in kernel-patches CI, which
          I can't even reproduce locally. Also -Wdiscarded-qualifiers pragma for
          jit_disasm.c is not supported by Clang, it needs to be
          -Wincompatible-pointer-types-discards-qualifiers for Clang; we don't have
          to deal with that in this patch set;
      v1->v2:
        - fix few typos and wrong copy/paste;
        - fix #pragma push -> pop.
      ====================
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      6b282765
    • A
      selftests/bpf: drop custom NULL #define in skb_pkt_end selftest · c53a3355
      Andrii Nakryiko 提交于
      Now that bpftool generates NULL definition as part of vmlinux.h, drop custom
      NULL definition in skb_pkt_end.c.
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20210317200510.1354627-3-andrii@kernel.orgSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
      c53a3355
    • A
      libbpf: provide NULL and KERNEL_VERSION macros in bpf_helpers.h · 9ae2c26e
      Andrii Nakryiko 提交于
      Given that vmlinux.h is not compatible with headers like stddef.h, NULL poses
      an annoying problem: it is defined as #define, so is not captured in BTF, so
      is not emitted into vmlinux.h. This leads to users either sticking to explicit
      0, or defining their own NULL (as progs/skb_pkt_end.c does).
      
      But it's easy for bpf_helpers.h to provide (conditionally) NULL definition.
      Similarly, KERNEL_VERSION is another commonly missed macro that came up
      multiple times. So this patch adds both of them, along with offsetof(), that
      also is typically defined in stddef.h, just like NULL.
      
      This might cause compilation warning for existing BPF applications defining
      their own NULL and/or KERNEL_VERSION already:
      
        progs/skb_pkt_end.c:7:9: warning: 'NULL' macro redefined [-Wmacro-redefined]
        #define NULL 0
                ^
        /tmp/linux/tools/testing/selftests/bpf/tools/include/vmlinux.h:4:9: note: previous definition is here
        #define NULL ((void *)0)
      	  ^
      
      It is trivial to fix, though, so long-term benefits outweight temporary
      inconveniences.
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20210317200510.1354627-2-andrii@kernel.orgSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
      9ae2c26e
    • Y
      bpf: net: Emit anonymous enum with BPF_TCP_CLOSE value explicitly · 97a19caf
      Yonghong Song 提交于
      The selftest failed to compile with clang-built bpf-next.
      Adding LLVM=1 to your vmlinux and selftest build will use clang.
      The error message is:
        progs/test_sk_storage_tracing.c:38:18: error: use of undeclared identifier 'BPF_TCP_CLOSE'
                if (newstate == BPF_TCP_CLOSE)
                                ^
        1 error generated.
        make: *** [Makefile:423: /bpf-next/tools/testing/selftests/bpf/test_sk_storage_tracing.o] Error 1
      
      The reason for the failure is that BPF_TCP_CLOSE, a value of
      an anonymous enum defined in uapi bpf.h, is not defined in
      vmlinux.h. gcc does not have this problem. Since vmlinux.h
      is derived from BTF which is derived from vmlinux DWARF,
      that means gcc-produced vmlinux DWARF has BPF_TCP_CLOSE
      while llvm-produced vmlinux DWARF does not have.
      
      BPF_TCP_CLOSE is referenced in net/ipv4/tcp.c as
        BUILD_BUG_ON((int)BPF_TCP_CLOSE != (int)TCP_CLOSE);
      The following test mimics the above BUILD_BUG_ON, preprocessed
      with clang compiler, and shows gcc DWARF contains BPF_TCP_CLOSE while
      llvm DWARF does not.
      
        $ cat t.c
        enum {
          BPF_TCP_ESTABLISHED = 1,
          BPF_TCP_CLOSE = 7,
        };
        enum {
          TCP_ESTABLISHED = 1,
          TCP_CLOSE = 7,
        };
      
        int test() {
          do {
            extern void __compiletime_assert_767(void) ;
            if ((int)BPF_TCP_CLOSE != (int)TCP_CLOSE) __compiletime_assert_767();
          } while (0);
          return 0;
        }
        $ clang t.c -O2 -c -g && llvm-dwarfdump t.o | grep BPF_TCP_CLOSE
        $ gcc t.c -O2 -c -g && llvm-dwarfdump t.o | grep BPF_TCP_CLOSE
                          DW_AT_name    ("BPF_TCP_CLOSE")
      
      Further checking clang code find clang actually tried to
      evaluate condition at compile time. If it is definitely
      true/false, it will perform optimization and the whole if condition
      will be removed before generating IR/debuginfo.
      
      This patch explicited add an expression after the
      above mentioned BUILD_BUG_ON in net/ipv4/tcp.c like
        (void)BPF_TCP_ESTABLISHED
      to enable generation of debuginfo for the anonymous
      enum which also includes BPF_TCP_CLOSE.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210317174132.589276-1-yhs@fb.com
      97a19caf
  3. 17 3月, 2021 10 次提交
  4. 16 3月, 2021 4 次提交
  5. 11 3月, 2021 5 次提交
  6. 10 3月, 2021 12 次提交
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · c1acda98
      David S. Miller 提交于
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf-next 2021-03-09
      
      The following pull-request contains BPF updates for your *net-next* tree.
      
      We've added 90 non-merge commits during the last 17 day(s) which contain
      a total of 114 files changed, 5158 insertions(+), 1288 deletions(-).
      
      The main changes are:
      
      1) Faster bpf_redirect_map(), from Björn.
      
      2) skmsg cleanup, from Cong.
      
      3) Support for floating point types in BTF, from Ilya.
      
      4) Documentation for sys_bpf commands, from Joe.
      
      5) Support for sk_lookup in bpf_prog_test_run, form Lorenz.
      
      6) Enable task local storage for tracing programs, from Song.
      
      7) bpf_for_each_map_elem() helper, from Yonghong.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c1acda98
    • L
      Merge git://git.kernel.org:/pub/scm/linux/kernel/git/netdev/net · 05a59d79
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Fix transmissions in dynamic SMPS mode in ath9k, from Felix Fietkau.
      
       2) TX skb error handling fix in mt76 driver, also from Felix.
      
       3) Fix BPF_FETCH atomic in x86 JIT, from Brendan Jackman.
      
       4) Avoid double free of percpu pointers when freeing a cloned bpf prog.
          From Cong Wang.
      
       5) Use correct printf format for dma_addr_t in ath11k, from Geert
          Uytterhoeven.
      
       6) Fix resolve_btfids build with older toolchains, from Kun-Chuan
          Hsieh.
      
       7) Don't report truncated frames to mac80211 in mt76 driver, from
          Lorenzop Bianconi.
      
       8) Fix watcdog timeout on suspend/resume of stmmac, from Joakim Zhang.
      
       9) mscc ocelot needs NET_DEVLINK selct in Kconfig, from Arnd Bergmann.
      
      10) Fix sign comparison bug in TCP_ZEROCOPY_RECEIVE getsockopt(), from
          Arjun Roy.
      
      11) Ignore routes with deleted nexthop object in mlxsw, from Ido
          Schimmel.
      
      12) Need to undo tcp early demux lookup sometimes in nf_nat, from
          Florian Westphal.
      
      13) Fix gro aggregation for udp encaps with zero csum, from Daniel
          Borkmann.
      
      14) Make sure to always use imp*_ndo_send when necessaey, from Jason A.
          Donenfeld.
      
      15) Fix TRSCER masks in sh_eth driver from Sergey Shtylyov.
      
      16) prevent overly huge skb allocationsd in qrtr, from Pavel Skripkin.
      
      17) Prevent rx ring copnsumer index loss of sync in enetc, from Vladimir
          Oltean.
      
      18) Make sure textsearch copntrol block is large enough, from Wilem de
          Bruijn.
      
      19) Revert MAC changes to r8152 leading to instability, from Hates Wang.
      
      20) Advance iov in 9p even for empty reads, from Jissheng Zhang.
      
      21) Double hook unregister in nftables, from PabloNeira Ayuso.
      
      22) Fix memleak in ixgbe, fropm Dinghao Liu.
      
      23) Avoid dups in pkt scheduler class dumps, from Maximilian Heyne.
      
      24) Various mptcp fixes from Florian Westphal, Paolo Abeni, and Geliang
          Tang.
      
      25) Fix DOI refcount bugs in cipso, from Paul Moore.
      
      26) One too many irqsave in ibmvnic, from Junlin Yang.
      
      27) Fix infinite loop with MPLS gso segmenting via virtio_net, from
          Balazs Nemeth.
      
      * git://git.kernel.org:/pub/scm/linux/kernel/git/netdev/net: (164 commits)
        s390/qeth: fix notification for pending buffers during teardown
        s390/qeth: schedule TX NAPI on QAOB completion
        s390/qeth: improve completion of pending TX buffers
        s390/qeth: fix memory leak after failed TX Buffer allocation
        net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0
        net: check if protocol extracted by virtio_net_hdr_set_proto is correct
        net: dsa: xrs700x: check if partner is same as port in hsr join
        net: lapbether: Remove netif_start_queue / netif_stop_queue
        atm: idt77252: fix null-ptr-dereference
        atm: uPD98402: fix incorrect allocation
        atm: fix a typo in the struct description
        net: qrtr: fix error return code of qrtr_sendmsg()
        mptcp: fix length of ADD_ADDR with port sub-option
        net: bonding: fix error return code of bond_neigh_init()
        net: enetc: allow hardware timestamping on TX queues with tc-etf enabled
        net: enetc: set MAC RX FIFO to recommended value
        net: davicom: Use platform_get_irq_optional()
        net: davicom: Fix regulator not turned off on driver removal
        net: davicom: Fix regulator not turned off on failed probe
        net: dsa: fix switchdev objects on bridge master mistakenly being applied on ports
        ...
      05a59d79
    • L
      Merge git://git.kernel.org:/pub/scm/linux/kernel/git/davem/sparc · 6a30bedf
      Linus Torvalds 提交于
      Pull sparc fixes from David Miller:
       "Fix opcode filtering for exceptions, and clean up defconfig"
      
      * git://git.kernel.org:/pub/scm/linux/kernel/git/davem/sparc:
        sparc: sparc64_defconfig: remove duplicate CONFIGs
        sparc64: Fix opcode filtering in handling of no fault loads
      6a30bedf
    • C
      sparc: sparc64_defconfig: remove duplicate CONFIGs · 69264b4a
      Corentin Labbe 提交于
      After my patch there is CONFIG_ATA defined twice.
      Remove the duplicate one.
      Same problem for CONFIG_HAPPYMEAL, except I added as builtin for boot
      test with NFS.
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Fixes: a57cdeb3 ("sparc: sparc64_defconfig: add necessary configs for qemu")
      Signed-off-by: NCorentin Labbe <clabbe@baylibre.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      69264b4a
    • R
      sparc64: Fix opcode filtering in handling of no fault loads · e5e8b80d
      Rob Gardner 提交于
      is_no_fault_exception() has two bugs which were discovered via random
      opcode testing with stress-ng. Both are caused by improper filtering
      of opcodes.
      
      The first bug can be triggered by a floating point store with a no-fault
      ASI, for instance "sta %f0, [%g0] #ASI_PNF", opcode C1A01040.
      
      The code first tests op3[5] (0x1000000), which denotes a floating
      point instruction, and then tests op3[2] (0x200000), which denotes a
      store instruction. But these bits are not mutually exclusive, and the
      above mentioned opcode has both bits set. The intent is to filter out
      stores, so the test for stores must be done first in order to have
      any effect.
      
      The second bug can be triggered by a floating point load with one of
      the invalid ASI values 0x8e or 0x8f, which pass this check in
      is_no_fault_exception():
           if ((asi & 0xf2) == ASI_PNF)
      
      An example instruction is "ldqa [%l7 + %o7] #ASI 0x8f, %f38",
      opcode CF95D1EF. Asi values greater than 0x8b (ASI_SNFL) are fatal
      in handle_ldf_stq(), and is_no_fault_exception() must not allow these
      invalid asi values to make it that far.
      
      In both of these cases, handle_ldf_stq() reacts by calling
      sun4v_data_access_exception() or spitfire_data_access_exception(),
      which call is_no_fault_exception() and results in an infinite
      recursion.
      Signed-off-by: NRob Gardner <rob.gardner@oracle.com>
      Tested-by: NAnatoly Pugachev <matorola@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e5e8b80d
    • D
      Merge branch 's390-qeth-fixes' · 85154557
      David S. Miller 提交于
      Julian Wiedmann says:
      
      ====================
      s390/qeth: fixes 2021-03-09
      
      please apply the following patch series to netdev's net tree.
      
      This brings one fix for a memleak in an error path of the setup code.
      Also several fixes for dealing with pending TX buffers - two for old
      bugs in their completion handling, and one recent regression in a
      teardown path.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85154557
    • J
      s390/qeth: fix notification for pending buffers during teardown · 7eefda7f
      Julian Wiedmann 提交于
      The cited commit reworked the state machine for pending TX buffers.
      In qeth_iqd_tx_complete() it turned PENDING into a transient state, and
      uses NEED_QAOB for buffers that get parked while waiting for their QAOB
      completion.
      
      But it missed to adjust the check in qeth_tx_complete_buf(). So if
      qeth_tx_complete_pending_bufs() is called during teardown to drain
      the parked TX buffers, we no longer raise a notification for af_iucv.
      
      Instead of updating the checked state, just move this code into
      qeth_tx_complete_pending_bufs() itself. This also gets rid of the
      special-case in the common TX completion path.
      
      Fixes: 8908f36d ("s390/qeth: fix af_iucv notification race")
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7eefda7f
    • J
      s390/qeth: schedule TX NAPI on QAOB completion · 3e83d467
      Julian Wiedmann 提交于
      When a QAOB notifies us that a pending TX buffer has been delivered, the
      actual TX completion processing by qeth_tx_complete_pending_bufs()
      is done within the context of a TX NAPI instance. We shouldn't rely on
      this instance being scheduled by some other TX event, but just do it
      ourselves.
      
      qeth_qdio_handle_aob() is called from qeth_poll(), ie. our main NAPI
      instance. To avoid touching the TX queue's NAPI instance
      before/after it is (un-)registered, reorder the code in qeth_open()
      and qeth_stop() accordingly.
      
      Fixes: 0da9581d ("qeth: exploit asynchronous delivery of storage blocks")
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3e83d467
    • J
      s390/qeth: improve completion of pending TX buffers · c20383ad
      Julian Wiedmann 提交于
      The current design attaches a pending TX buffer to a custom
      single-linked list, which is anchored at the buffer's slot on the
      TX ring. The buffer is then checked for final completion whenever
      this slot is processed during a subsequent TX NAPI poll cycle.
      
      But if there's insufficient traffic on the ring, we might never make
      enough progress to get back to this ring slot and discover the pending
      buffer's final TX completion. In particular if this missing TX
      completion blocks the application from sending further traffic.
      
      So convert the custom single-linked list code to a per-queue list_head,
      and scan this list on every TX NAPI cycle.
      
      Fixes: 0da9581d ("qeth: exploit asynchronous delivery of storage blocks")
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c20383ad
    • J
      s390/qeth: fix memory leak after failed TX Buffer allocation · e7a36d27
      Julian Wiedmann 提交于
      When qeth_alloc_qdio_queues() fails to allocate one of the buffers that
      back an Output Queue, the 'out_freeoutqbufs' path will free all
      previously allocated buffers for this queue. But it misses to free the
      half-finished queue struct itself.
      
      Move the buffer allocation into qeth_alloc_output_queue(), and deal with
      such errors internally.
      
      Fixes: 0da9581d ("qeth: exploit asynchronous delivery of storage blocks")
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Reviewed-by: NAlexandra Winter <wintera@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7a36d27
    • D
      Merge branch 'virtio_net-infinite-loop' · b005c9ef
      David S. Miller 提交于
      Balazs Nemeth says:
      
      ====================
      net: prevent infinite loop caused by incorrect proto from virtio_net_hdr_set_proto
      
      These patches prevent an infinite loop for gso packets with a protocol
      from virtio net hdr that doesn't match the protocol in the packet.
      Note that packets coming from a device without
      header_ops->parse_protocol being implemented will not be caught by
      the check in virtio_net_hdr_to_skb, but the infinite loop will still
      be prevented by the check in the gso layer.
      
      Changes from v2 to v3:
        - Remove unused *eth.
        - Use MPLS_HLEN to also check if the MPLS header length is a multiple
          of four.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b005c9ef
    • B
      net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0 · d348ede3
      Balazs Nemeth 提交于
      A packet with skb_inner_network_header(skb) == skb_network_header(skb)
      and ETH_P_MPLS_UC will prevent mpls_gso_segment from pulling any headers
      from the packet. Subsequently, the call to skb_mac_gso_segment will
      again call mpls_gso_segment with the same packet leading to an infinite
      loop. In addition, ensure that the header length is a multiple of four,
      which should hold irrespective of the number of stacked labels.
      Signed-off-by: NBalazs Nemeth <bnemeth@redhat.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d348ede3