1. 10 11月, 2021 2 次提交
  2. 09 11月, 2021 10 次提交
    • A
      amt: add IPV6 Kconfig dependency · 9758aba8
      Arnd Bergmann 提交于
      This driver cannot be built-in if IPV6 is a loadable module:
      
      x86_64-linux-ld: drivers/net/amt.o: in function `amt_build_mld_gq':
      amt.c:(.text+0x2e7d): undefined reference to `ipv6_dev_get_saddr'
      
      Add the idiomatic Kconfig dependency that all such modules
      have.
      
      Fixes: b9022b53 ("amt: add control plane of amt interface")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9758aba8
    • D
      gve: Fix off by one in gve_tx_timeout() · 1c360cc1
      Dan Carpenter 提交于
      The priv->ntfy_blocks[] has "priv->num_ntfy_blks" elements so this >
      needs to be >= to prevent an off by one bug.  The priv->ntfy_blocks[]
      array is allocated in gve_alloc_notify_blocks().
      
      Fixes: 87a7f321 ("gve: Recover from queue stall due to missed IRQ")
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1c360cc1
    • L
      hamradio: defer 6pack kfree after unregister_netdev · 0b911192
      Lin Ma 提交于
      There is a possible race condition (use-after-free) like below
      
       (USE)                       |  (FREE)
        dev_queue_xmit             |
         __dev_queue_xmit          |
          __dev_xmit_skb           |
           sch_direct_xmit         | ...
            xmit_one               |
             netdev_start_xmit     | tty_ldisc_kill
              __netdev_start_xmit  |  6pack_close
               sp_xmit             |   kfree
                sp_encaps          |
                                   |
      
      According to the patch "defer ax25 kfree after unregister_netdev", this
      patch reorder the kfree after the unregister_netdev to avoid the possible
      UAF as the unregister_netdev() is well synchronized and won't return if
      there is a running routine.
      Signed-off-by: NLin Ma <linma@zju.edu.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0b911192
    • L
      hamradio: defer ax25 kfree after unregister_netdev · 3e0588c2
      Lin Ma 提交于
      There is a possible race condition (use-after-free) like below
      
       (USE)                       |  (FREE)
      ax25_sendmsg                 |
       ax25_queue_xmit             |
        dev_queue_xmit             |
         __dev_queue_xmit          |
          __dev_xmit_skb           |
           sch_direct_xmit         | ...
            xmit_one               |
             netdev_start_xmit     | tty_ldisc_kill
              __netdev_start_xmit  |  mkiss_close
               ax_xmit             |   kfree
                ax_encaps          |
                                   |
      
      Even though there are two synchronization primitives before the kfree:
      1. wait_for_completion(&ax->dead). This can prevent the race with
      routines from mkiss_ioctl. However, it cannot stop the routine coming
      from upper layer, i.e., the ax25_sendmsg.
      
      2. netif_stop_queue(ax->dev). It seems that this line of code aims to
      halt the transmit queue but it fails to stop the routine that already
      being xmit.
      
      This patch reorder the kfree after the unregister_netdev to avoid the
      possible UAF as the unregister_netdev() is well synchronized and won't
      return if there is a running routine.
      Signed-off-by: NLin Ma <linma@zju.edu.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3e0588c2
    • J
      net: sungem_phy: fix code indentation · 54f0bad6
      Jean Sacren 提交于
      Remove extra space in front of the return statement.
      
      Fixes: eb5b5b2f ("sungem_phy: support bcm5461 phy, autoneg.")
      Signed-off-by: NJean Sacren <sakiwit@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      54f0bad6
    • J
      bpf, sockmap: sk_skb data_end access incorrect when src_reg = dst_reg · b2c46181
      Jussi Maki 提交于
      The current conversion of skb->data_end reads like this:
      
        ; data_end = (void*)(long)skb->data_end;
         559: (79) r1 = *(u64 *)(r2 +200)   ; r1  = skb->data
         560: (61) r11 = *(u32 *)(r2 +112)  ; r11 = skb->len
         561: (0f) r1 += r11
         562: (61) r11 = *(u32 *)(r2 +116)
         563: (1f) r1 -= r11
      
      But similar to the case in 84f44df6 ("bpf: sock_ops sk access may stomp
      registers when dst_reg = src_reg"), the code will read an incorrect skb->len
      when src == dst. In this case we end up generating this xlated code:
      
        ; data_end = (void*)(long)skb->data_end;
         559: (79) r1 = *(u64 *)(r1 +200)   ; r1  = skb->data
         560: (61) r11 = *(u32 *)(r1 +112)  ; r11 = (skb->data)->len
         561: (0f) r1 += r11
         562: (61) r11 = *(u32 *)(r1 +116)
         563: (1f) r1 -= r11
      
      ... where line 560 is the reading 4B of (skb->data + 112) instead of the
      intended skb->len Here the skb pointer in r1 gets set to skb->data and the
      later deref for skb->len ends up following skb->data instead of skb.
      
      This fixes the issue similarly to the patch mentioned above by creating an
      additional temporary variable and using to store the register when dst_reg =
      src_reg. We name the variable bpf_temp_reg and place it in the cb context for
      sk_skb. Then we restore from the temp to ensure nothing is lost.
      
      Fixes: 16137b09 ("bpf: Compute data_end dynamically with JIT code")
      Signed-off-by: NJussi Maki <joamaki@gmail.com>
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20211103204736.248403-6-john.fastabend@gmail.com
      b2c46181
    • J
      bpf: sockmap, strparser, and tls are reusing qdisc_skb_cb and colliding · e0dc3b93
      John Fastabend 提交于
      Strparser is reusing the qdisc_skb_cb struct to stash the skb message handling
      progress, e.g. offset and length of the skb. First this is poorly named and
      inherits a struct from qdisc that doesn't reflect the actual usage of cb[] at
      this layer.
      
      But, more importantly strparser is using the following to access its metadata.
      
        (struct _strp_msg *)((void *)skb->cb + offsetof(struct qdisc_skb_cb, data))
      
      Where _strp_msg is defined as:
      
        struct _strp_msg {
              struct strp_msg            strp;                 /*     0     8 */
              int                        accum_len;            /*     8     4 */
      
              /* size: 12, cachelines: 1, members: 2 */
              /* last cacheline: 12 bytes */
        };
      
      So we use 12 bytes of ->data[] in struct. However in BPF code running parser
      and verdict the user has read capabilities into the data[] array as well. Its
      not too problematic, but we should not be exposing internal state to BPF
      program. If its really needed then we can use the probe_read() APIs which allow
      reading kernel memory. And I don't believe cb[] layer poses any API breakage by
      moving this around because programs can't depend on cb[] across layers.
      
      In order to fix another issue with a ctx rewrite we need to stash a temp
      variable somewhere. To make this work cleanly this patch builds a cb struct
      for sk_skb types called sk_skb_cb struct. Then we can use this consistently
      in the strparser, sockmap space. Additionally we can start allowing ->cb[]
      write access after this.
      
      Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: NJussi Maki <joamaki@gmail.com>
      Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20211103204736.248403-5-john.fastabend@gmail.com
      e0dc3b93
    • J
      bpf, sockmap: Fix race in ingress receive verdict with redirect to self · c5d2177a
      John Fastabend 提交于
      A socket in a sockmap may have different combinations of programs attached
      depending on configuration. There can be no programs in which case the socket
      acts as a sink only. There can be a TX program in this case a BPF program is
      attached to sending side, but no RX program is attached. There can be an RX
      program only where sends have no BPF program attached, but receives are hooked
      with BPF. And finally, both TX and RX programs may be attached. Giving us the
      permutations:
      
       None, Tx, Rx, and TxRx
      
      To date most of our use cases have been TX case being used as a fast datapath
      to directly copy between local application and a userspace proxy. Or Rx cases
      and TxRX applications that are operating an in kernel based proxy. The traffic
      in the first case where we hook applications into a userspace application looks
      like this:
      
        AppA  redirect   AppB
         Tx <-----------> Rx
         |                |
         +                +
         TCP <--> lo <--> TCP
      
      In this case all traffic from AppA (after 3whs) is copied into the AppB
      ingress queue and no traffic is ever on the TCP recieive_queue.
      
      In the second case the application never receives, except in some rare error
      cases, traffic on the actual user space socket. Instead the send happens in
      the kernel.
      
                 AppProxy       socket pool
             sk0 ------------->{sk1,sk2, skn}
              ^                      |
              |                      |
              |                      v
             ingress              lb egress
             TCP                  TCP
      
      Here because traffic is never read off the socket with userspace recv() APIs
      there is only ever one reader on the sk receive_queue. Namely the BPF programs.
      
      However, we've started to introduce a third configuration where the BPF program
      on receive should process the data, but then the normal case is to push the
      data into the receive queue of AppB.
      
             AppB
             recv()                (userspace)
           -----------------------
             tcp_bpf_recvmsg()     (kernel)
               |             |
               |             |
               |             |
             ingress_msgQ    |
               |             |
             RX_BPF          |
               |             |
               v             v
             sk->receive_queue
      
      This is different from the App{A,B} redirect because traffic is first received
      on the sk->receive_queue.
      
      Now for the issue. The tcp_bpf_recvmsg() handler first checks the ingress_msg
      queue for any data handled by the BPF rx program and returned with PASS code
      so that it was enqueued on the ingress msg queue. Then if no data exists on
      that queue it checks the socket receive queue. Unfortunately, this is the same
      receive_queue the BPF program is reading data off of. So we get a race. Its
      possible for the recvmsg() hook to pull data off the receive_queue before the
      BPF hook has a chance to read it. It typically happens when an application is
      banging on recv() and getting EAGAINs. Until they manage to race with the RX
      BPF program.
      
      To fix this we note that before this patch at attach time when the socket is
      loaded into the map we check if it needs a TX program or just the base set of
      proto bpf hooks. Then it uses the above general RX hook regardless of if we
      have a BPF program attached at rx or not. This patch now extends this check to
      handle all cases enumerated above, TX, RX, TXRX, and none. And to fix above
      race when an RX program is attached we use a new hook that is nearly identical
      to the old one except now we do not let the recv() call skip the RX BPF program.
      Now only the BPF program pulls data from sk->receive_queue and recv() only
      pulls data from the ingress msgQ post BPF program handling.
      
      With this resolved our AppB from above has been up and running for many hours
      without detecting any errors. We do this by correlating counters in RX BPF
      events and the AppB to ensure data is never skipping the BPF program. Selftests,
      was not able to detect this because we only run them for a short period of time
      on well ordered send/recvs so we don't get any of the noise we see in real
      application environments.
      
      Fixes: 51199405 ("bpf: skb_verdict, support SK_PASS on RX BPF path")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: NJussi Maki <joamaki@gmail.com>
      Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20211103204736.248403-4-john.fastabend@gmail.com
      c5d2177a
    • J
      bpf, sockmap: Remove unhash handler for BPF sockmap usage · b8b8315e
      John Fastabend 提交于
      We do not need to handle unhash from BPF side we can simply wait for the
      close to happen. The original concern was a socket could transition from
      ESTABLISHED state to a new state while the BPF hook was still attached.
      But, we convinced ourself this is no longer possible and we also improved
      BPF sockmap to handle listen sockets so this is no longer a problem.
      
      More importantly though there are cases where unhash is called when data is
      in the receive queue. The BPF unhash logic will flush this data which is
      wrong. To be correct it should keep the data in the receive queue and allow
      a receiving application to continue reading the data. This may happen when
      tcp_abort() is received for example. Instead of complicating the logic in
      unhash simply moving all this to tcp_close() hook solves this.
      
      Fixes: 51199405 ("bpf: skb_verdict, support SK_PASS on RX BPF path")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: NJussi Maki <joamaki@gmail.com>
      Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20211103204736.248403-3-john.fastabend@gmail.com
      b8b8315e
    • J
      bpf, sockmap: Use stricter sk state checks in sk_lookup_assign · 40a34121
      John Fastabend 提交于
      In order to fix an issue with sockets in TCP sockmap redirect cases we plan
      to allow CLOSE state sockets to exist in the sockmap. However, the check in
      bpf_sk_lookup_assign() currently only invalidates sockets in the
      TCP_ESTABLISHED case relying on the checks on sockmap insert to ensure we
      never SOCK_CLOSE state sockets in the map.
      
      To prepare for this change we flip the logic in bpf_sk_lookup_assign() to
      explicitly test for the accepted cases. Namely, a tcp socket in TCP_LISTEN
      or a udp socket in TCP_CLOSE state. This also makes the code more resilent
      to future changes.
      Suggested-by: NJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20211103204736.248403-2-john.fastabend@gmail.com
      40a34121
  3. 08 11月, 2021 9 次提交
  4. 07 11月, 2021 11 次提交
  5. 06 11月, 2021 2 次提交
    • N
      ipv6: remove useless assignment to newinet in tcp_v6_syn_recv_sock() · 70bf363d
      Nghia Le 提交于
      The newinet value is initialized with inet_sk() in a block code to
      handle sockets for the ETH_P_IP protocol. Along this code path,
      newinet is never read. Thus, assignment to newinet is needless and
      can be removed.
      Signed-off-by: NNghia Le <nghialm78@gmail.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20211104143740.32446-1-nghialm78@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      70bf363d
    • J
      Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 9bea6aa4
      Jakub Kicinski 提交于
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2021-11-05
      
      We've added 15 non-merge commits during the last 3 day(s) which contain
      a total of 14 files changed, 199 insertions(+), 90 deletions(-).
      
      The main changes are:
      
      1) Fix regression from stack spill/fill of <8 byte scalars, from Martin KaFai Lau.
      
      2) Fix perf's build of bpftool's bootstrap version due to missing libbpf
         headers, from Quentin Monnet.
      
      3) Fix riscv{32,64} BPF exception tables build errors and warnings, from Björn Töpel.
      
      4) Fix bpf fs to allow RENAME_EXCHANGE support for atomic upgrades on sk_lookup
         control planes, from Lorenz Bauer.
      
      5) Fix libbpf's error reporting in bpf_map_lookup_and_delete_elem_flags() due to
         missing libbpf_err_errno(), from Mehrdad Arshad Rad.
      
      6) Various fixes to make xdp_redirect_multi selftest more reliable, from Hangbin Liu.
      
      7) Fix netcnt selftest to make it run serial and thus avoid conflicts with other
         cgroup/skb selftests run in parallel that could cause flakes, from Andrii Nakryiko.
      
      8) Fix reuseport_bpf_numa networking selftest to skip unavailable NUMA nodes,
         from Kleber Sacilotto de Souza.
      
      * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        riscv, bpf: Fix RV32 broken build, and silence RV64 warning
        selftests/bpf/xdp_redirect_multi: Limit the tests in netns
        selftests/bpf/xdp_redirect_multi: Give tcpdump a chance to terminate cleanly
        selftests/bpf/xdp_redirect_multi: Use arping to accurate the arp number
        selftests/bpf/xdp_redirect_multi: Put the logs to tmp folder
        libbpf: Fix lookup_and_delete_elem_flags error reporting
        bpftool: Install libbpf headers for the bootstrap version, too
        selftests/net: Fix reuseport_bpf_numa by skipping unavailable nodes
        selftests/bpf: Verifier test on refill from a smaller spill
        bpf: Do not reject when the stack read size is different from the tracked scalar size
        selftests/bpf: Make netcnt selftests serial to avoid spurious failures
        selftests/bpf: Test RENAME_EXCHANGE and RENAME_NOREPLACE on bpffs
        selftests/bpf: Convert test_bpffs to ASSERT macros
        libfs: Support RENAME_EXCHANGE in simple_rename()
        libfs: Move shmem_exchange to simple_rename_exchange
      ====================
      
      Link: https://lore.kernel.org/r/20211105165803.29372-1-daniel@iogearbox.netSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      9bea6aa4
  6. 05 11月, 2021 6 次提交